PubTech Radar Scan: Issue 42

Usual mix of academic publishing tech news, launches, AI developments, and longer reads.

May 20, 2026

I’ve taken a bit of time away from the newsletter because life got in the way, and I’m slowly finding my way back. Today’s issue is a mix of what’s current with a sprinkling of things I have written over the past few months, but never quite managed to assemble into a newsletter. As always, it’s a genuine privilege to have you reading.

You can catch me later today at PurePub.ai, where Ian Mulvany, Andrew Smeall, and Ann Michael will be exploring the emerging opportunities around AI in What should publishers be talking about?

🆕 News:

The PurePub.ai conference is up and running this week, and there’s still time to catch a couple of the sessions live.
STM is seeking feedback on its new brief Toward Responsible Use of Research Content in Generative AI, looking at how scholarly values and standards could be respected when using research content in and by GenAI systems and applications.
arXiv’s policy on the use of generative AI is, I think, a very sensible approach: “Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated.” Reading through the comments, especially from some of the more indignant researchers, is a treat. What, I have to check the paper that bears my name?
Scopus AI is now called ‘AI Discovery’, a name that must have taken the marketing team hours of hard thought to come up with 😉.

🚀 Launches:

Google’s Deep Research Max announcement explicitly names open-access peer-reviewed journals as one example of the authoritative sources its reports can draw on. I don’t keep a close eye on this, but is this one of the first mainstream AI product announcements to foreground open-access peer-reviewed content as the content pool?
Sci-Hub (widely treated by courts and publishers as a copyright-infringing shadow library) has launched a new AI-powered research assistant called Sci-Bot (sci-bot.ru). Mixed reviews from scientists on Chemical & Engineering News, who note that Sci-Hub’s increasingly limited access to paywalled literature reduces the value of the tool. Nick Morley from Grounded AI analysed the references and found that of the 514 references created, nearly 30% had some form of author misattribution.
My own quick test for a history of Digital Science was mixed; it covered the key points, nothing beyond 2023, and included some absolute clangers. For example, Daniel Hook wasn't the founding CEO. Around 50% of core claims and references were accurate, 30% inaccurate or misleading, 20% would need more careful checking.
Springer Nature’s Cureus is launching the Journal of AI-Augmented Research in May 2026 as “an open and trusted home for researchers using AI as part of their research practice, with clear standards for transparency, reproducibility, and human-verified validation”. It was only a matter of time before research created using AI, and its associated OA fees, found new homes at major publishers.
ReviewBench is an open-source, venue-agnostic framework that compares human and AI reviews across structure, alignment with a paper’s major claims, impact, and critique category. [Preprint | GitHub]

🖧 Fabricated citations:

My feeds are full of reports about fabricated citations in research papers. I am quite curious to know how this October 2025 (!) Frontiers paper got published. 🤷
Max Topaz’s CITADEL site is a great piece of data science and visualisation summarising their research findings about fabricated citations.
Retraction Watch has a good summary of Topaz et al.’s research. I thought this was a good comment: “For instance, citations that play a minor role (e.g., placing the reported study within the context of existing literature; serving a rhetorical function) and which can be withdrawn without affecting the argument made or the validity of the results can be addressed in a correction that is transparent with respect to the authors’ lapse in vigilance.” (It’s surprisingly easy to get a fabricated reference when you do something like ask for the landmark study in JAMA about X.)
Tim Elfenbein asks what I think is the more important question: “Machine fabricated citations are a problem, but they are an extension of lots of other all-too-human problems w/ citations that have gotten little attention. What is worse: a completely fabricated citation or a misrepresentation of an existing cited work?”
A Nature analysis estimates that more than 110,000 scholarly publications from 2025 alone may contain fabricated references generated by AI. [Summary on LinkedIn]
I am feeling slightly wistful. Back in the day, there were teams of people (like me) working in ‘Electronic production’ departments manually checking that every URL in a reference list resolved to the right paper before it went online. Utterly mind-numbing but quite relaxing work that has largely been automated away (thankfully!). It is hard to believe the work that went into getting papers online 20+ years ago, the levels of QA done because of the fear that something might be incorrect online (which, to be fair, it often was).

🤖 AI:

IOPP’s new Duplicate Review Checker uses machine learning to identify reviewer reports with significant content overlap. Since its 2024 pilot, it has processed around 500,000 reports going back to 2020, flagging matches above a 60% similarity threshold. It catches both reports reused across multiple manuscripts and reports submitted under different reviewer names, with flagged reports routed to the Research Integrity team for review. Around 2,500 duplicates have been caught so far.
🎙️ Prathik Roy, Product Director for Data and AI Solutions at Springer Nature, on The Product Experience podcast, covers a lot of ground, including: “You’re effectively at an inflection point where you need to start thinking about whether you are building for humans to consume content via machines instead of humans directly consuming content off of your platforms.” So, how do you quantify the value of content when nobody visits your site anymore? Prathik walks through tracking content into AI pipelines, measuring what proportion of an LLM’s answer came from your sources, and eventually pricing by outcomes rather than access.
📊 COUNTER has released its Best Practice on Generative and Agentic AI usage metrics. See also the Research Information article by Tasha Mellins-Cohen.
The Foresight Institute is funding projects on this premise: “In addition to applying AI to specific problems, we need better platforms, tools and data infrastructure to accelerate AI-guided scientific progress generally. Similarly, to get our sense-making ready for rapid change, we are interested in funding work that applies AI to improve forecasting and general epistemic preparedness.”
🗎 Talip Gönülal, Ramazan Güçlü and Salih Güçlü explore AI In Academic Publishing for Non-Native English Speakers: The Good, the Bot, and the Ugly: “… the good, reducing linguistic inequalities by improving paper quality and decreasing language-related challenges; the bad, involving inaccurate or misleading AI suggestions, over-reliance on AI tools, and diminished engagement with manuscripts; and the ugly, characterized by failure to disclose AI use, lack of clear guidelines for responsible AI integration in research, homogenization of academic writing, and the emergence of new forms of inequality.”
James O’Sullivan on The AI detection delusion: “...the solution to one form of technological recklessness cannot be another. Detection tools give their users the feeling of objective certainty while delivering probabilistic guesses, and in doing so, they can cause real harm to real people—students who lose marks or face disciplinary action, professionals whose reputations are damaged, and writers whose command of English is held against them by an algorithm that mistakes simplicity for artificiality.”
Dave Flanagan at Wiley makes a related point in Provenance, not detection, arguing that the right question for research integrity isn’t “did AI write this?”, it’s “can we trust this author?”. Note Chris Reid’s comment: “Trust will be at the author layer.”
Dustin Smith in Claude Code Is a Rogue Wave for Publishers writes that AI chatbots have driven a ~30% increase in manuscript submissions, but that AI agents will make that look small. If 15% of researchers adopt agentic practices by 2030, submission volumes double. The third-order consequence will be that the most valuable capability shifts from dissemination to rejection, routing, and authentication.

📚 Longer reads:

🎙️ I enjoyed and learnt a lot listening to this podcast with Andri Johnston, Digital Sustainability Lead at Cambridge University Press, talking in part about the environmental impact of AI.
Nova Techset CEO Yogesh Agarwal says we should pay more attention to the unglamorous operational work holding scholarly publishing together: editorial coordination, workflow handoffs, quality checks, all the stuff that doesn’t show up in dashboards but quietly keeps the system functioning. As submission volumes grow and AI gets layered in, Yogesh argues that this invisible work is multiplying rather than going away.
🗎 Gary Marcus flagged a new Nature paper at the TAIS meeting last week. The paper, “State media control influences large language models,” found that human annotators rated Chinese-language answers as more favourable to China in 75.3% of comparisons. This paper is about politics, but I think the findings might apply more broadly, for example, in fields where the Chinese-language and English-language framings of a topic might diverge, such as Traditional Chinese medicine. If you ask about acupuncture or moxibustion in English, and then in Chinese, would the same model give you different answers? Not because the underlying evidence differs but because the framing in Chinese-language sources tends to be more positive.
🔎︎ Aaron Tay on “We’re Good at Search”… Just Not the Kind That the AI era Demands — a Provocation, on how AI is changing librarians’ roles.
404 Media reports on Arizona State University’s rollout of “a platform called Atomic that creates AI-generated modules based on lectures taken from ASU faculty by cutting long videos down to very short clips then generating text and sections based on those clips.” I can see why this looked good on paper, and I can also see that this would be phenomenally difficult to do well with the tech we currently have.
Matthew Scott Goldstein on the scraper economy: “We track 15 companies systematically extracting publisher content to power AI systems. They are not all doing the same thing — but the result is identical. Publisher content flows in. Publisher compensation flows out. Here is how each tier works and what it looks like in practice. The market these 15 companies serve hit $1 billion in 2025 and is projected to reach $2 billion by 2030. Not one dollar of that flows back to publishers.”

And finally…

📖 Apparently there are enough novels about research integrity to fill a literature review. Lex Bouter has read 35 of them from the last 100 years, "with special attention to the research integrity topics, its drivers, and its consequences". A summer holiday reading list? ⛱️

End Notes

If you found this useful, you can always buy me a coffee.
If you need consulting help navigating any of this, find me at Maverick.

PubTech Radar Scan

Discussion about this post

Ready for more?