PubTech Radar Scan: Issue 31
News, some belated launch info (Data Catalyst newsletter, Scinito, Reedy), AI articles, a distillation of SN’s annual report by ChatGPT, longer reads, & copyright deniers bingo.
News
🔢 The Wikimedia Foundation has teamed up with Kaggle to launch streamlined Wikipedia datasets for AI training, hoping to ease the load on Wikipedia’s servers. Curious to see if it’ll catch on, Wikipedia might just have enough clout to make it work. It’s been nearly 20 years since platforms like HighWire tried offering similar services to publishers, only to be roundly ignored. The Data Licensing Alliance gave it another go in 2021 with much the same result. Perhaps the third time’s the charm?
🔧 Open knowledge doesn’t preserve itself. Rosalyn Metz on The Fight for Open Infrastructure Starts Now (Actually, It Started Yesterday)
🍪 “After more than half a decade of delayed timelines, government oversight and industry infighting, Google has fulfilled many expectations: a U-turn on third-party cookies in Chrome.” Digiday, on how the ad-supported media ecosystem has responded to what, in hindsight, has been a costly distraction.
Launches
📈 The Data Catalyst newsletter, Thoughts and news on data quality, semantic interoperability, AI, FAIR data, and more (Healthcare and Life Sciences focus) by Andrea Splendiani
🔍 SCiNiTO (this has been around for a while, but it’s new to me) offers full-text access to Open Access sources and institutional holdings, plus a few AI bells and whistles: an AI Chat for research assistance, an AI Reviewer Agent for manuscript evaluation, and a Journal Recommendation system to help you find the best place to publish.
📚 Reedy (and here I’m even slower off the mark!) is an AI-driven platform designed to simplify and optimise metadata management for books.
AI
👉 London-based? Don’t forget our next in-person AI in Publishing Collective meetups on 29 April and 14 May.
📊 Leslie McIntosh on why trusted data matters. “Large, linked datasets fuel AI tools, and that’s exactly why trusted data matters. Without it, we can't expect trustworthy outputs.”
📹 Michael Upshall has a nice write-up of the Fiesole Retreat 2025. I love this anecdote: "Paris O’Donnell, from the library of I Tatti, described a clever way of checking for missing or incorrectly shelved books in the library: simply videoing the sequence of books on the shelf and comparing it, using vibe-coded AI, to the library catalogue."
🤖 Florent Daudens reports about what might happen to news publishing if people stop actively embracing the web, paraphrasing:
Picture this: AI agents will be able to browse the web for us faster and better than us, without seeing the ads, We're producing more content than ever, AI will intermediate content for us, be it with our own tailored agents or without us realizing it (à la TikTok). Add these three trends: Models are being commoditized, so tech giants are racing past basic models to build products that will capture user attention, Smart devices will embed smaller but more capable models, giving users unprecedented control over how they want to get their news and content, Our culture is rapidly shifting from written to oral communication, This could mean the death of the web as we know it, as well as web advertising, just like digital almost killed print.
If people stop using the web, a big if, I think the impact on academic publishing will likely be slower and more nuanced than in news media, but I do think AI-intermediated content will present a real challenge, especially as aggregation risks stripping away important trust indicators and context (see Leslie McIntosh’s article above). The point about our culture shifting from written to oral communication is particularly interesting to think about.
🤩 AI tools are overhyping research results, according to a new study, with the newer models being the worst offenders. Not exactly surprising; it takes skill to summarise findings without stripping out all the nuance (I used to do this in my first publishing job and never got it quite right… ).
ChatGPT’s distillation of Springer Nature’s Annual Report
Annual reports are curious things — they tell the story a company wants you to hear, which isn’t always quite the same as what’s actually going on. Combine one of these reports with ChatGPT’s boundless enthusiasm, and you get the following Technology & AI Strategy Deep Dive based on Springer Nature’s FY2024 results:
🔍 Executive Summary
Springer Nature (SN) is rapidly evolving from a tech-enabled publisher to a tech-native operator. Their 2024 results showcase significant progress in applying AI and automation at scale across editorial, production, and product functions. Unlike many legacy peers still piloting GenAI, SN has embedded these capabilities across key revenue and operational workflows.
SN’s public commitment to “90+ AI pilots” masks a more sophisticated maturity: they are now internalizing tooling and platform components that structurally reduce cost, compress cycle times, and subtly redirect user behavior (authors/editors/librarians). Competitor publishers should monitor closely not just what SN builds, but how it operationalizes and integrates it.
📈 Key Technology Takeaways
1. AI at Scale: From Pilots to Process
90+ AI pilots run across all business units, indicating strategic experimentation beyond Research.
Tools such as Snapp (submission-to-accept platform) now handle >1 million submissions annually—a custom-built backbone for their editorial operations.
AI is no longer just R&D—it is production-grade and business-critical.
Implication: SN is outpacing competitors in turning GenAI into operational savings and workflow automation. Expect their cost base to become structurally leaner over time.
2. Tool Ecosystem: Modular & Use-Case Specific
SN has deployed a suite of targeted AI tools, each designed for a key pain point in the publishing lifecycle:
Implication: Springer’s tooling is focused on automation and user retention—each tool solves a workflow problem while also reinforcing author/editor loyalty within their ecosystem.
3. AI as a Product: Nature AI Assistant
SN piloted its Nature-branded AI assistant with ~600 researchers in 2024.
81% said it saved them time in reading/writing tasks.
Open beta launch is expected in Q2 2025.
Implication: This shifts SN’s AI from back-office tooling to revenue-generating user experiences. Positioning AI as a branded companion (not just functionality) opens doors to subscription, licensing, or institutional bundling models.
4. Data + Design: Subtle Platform Power Moves
eBook usage up 35% YoY—driven by data-informed library offerings like “Access & Select.”
SN uses interaction data to refine product design, automate recommendations, and drive purchasing decisions.
Internal rollout of an AI Academy trained 2,000+ employees.
Implication: SN’s tech is not only externalized—it’s internalized. Their workforce is being upskilled to normalize AI in everyday decisions, likely accelerating future tooling uptake.
5. Digital Maturity by Segment
Implication: Research is SN’s innovation engine. EdTech is a secondary frontier. Health remains a laggard, with limited signs of disruptive tech.
🧭 Strategic Watchpoints for Competitors
Cost Curve Advantage: SN’s 60% production cost savings via ACDCx will pressure competitors still using third-party typesetting vendors.
Sticky Ecosystem Design: Tools like T-Rex and Journal Finder reinforce author loyalty and increase submission velocity—a long-term market share play.
From Service to Product: Nature AI Assistant may evolve into a sellable subscription or institutional tool, unlocking new direct researcher revenues.
Employee Enablement: The internal AI Academy suggests a deeper change management program—expect faster rollout of future tools than peers.
Branded AI Arms Race: Nature branding gives SN an edge in trust and adoption. Expect other publishers to rush to brand their own AI companions.
🧠 Analyst Commentary
“Their innovation narrative is unusually coherent—from TAs and OA to automation and AI assistants. Most publishers can’t string that together yet.”
Here’s how ChatGPT sizes up the technology strategies at Springer Nature, Elsevier, Wiley, and Taylor & Francis. I think there’s some truth to the comparison, though Wiley and Taylor & Francis might see things a little differently!
Longer reads:
🔖 WIRED’s article explores how arXiv has quietly transformed global science by enabling researchers to share preprints quickly and freely. arXiv’s impact has been immense despite a rather retro façade.
🔖 Is it time to trade in the science journal? Once the reliable Volvo of research, are they becoming more Morris Minor-like, charming but entirely unsuited to the motorway of modern science? Worlock delivers ten firm nudges toward the exit, citing slow speeds, spiralling costs, and a distinct allergy to AI.
🔖 Money, Trust and Tools, How Two New Projects Aim to Strengthen Diamond Open Access by Elena Di Rosa (H/T: Samuel Moore on Bluesky)
🔖 In 2025, Gen AI seems intent on helping us feel a bit more together - or at least a bit less scattered. Top uses now include therapy, life admin, and finding purpose, while text editing and content polishing have quietly shuffled down the rankings. 📈 Interesting related stat: character.ai reportedly handles 20,000 queries per second—around a fifth of Google’s search volume.
🔖 Academic Databases and the Art of the Overcharge. “Clarivate, Elsevier, and the American Chemical Society are comfortable pursuing a strategy of pricing discrimination. But there is no reason libraries should allow this approach.” Alternatives like The Lens also exist.
And finally…
😂🎯 Every AI copyright argument you’ve ever heard in bingo form. Genius from Graham Lovelace: