PubTech Radar Scan: Issue 24

Oct 17, 2024

Things that have caught my attention over the last couple of weeks include STORM, Proemial, Signals launch, The Conversations from Wiley, replacing purchased class readings with AI-generated readings, and more. I’ve also included some thoughts about technology-enhanced peer review.

🚀 New/News/New to me

STORM is an experiment from Stanford to create Wikipedia-like reports on a topic. I liked seeing the sources the service was referring to and how it was generating alternative views on a topic but I found the results underwhelming and a bit off. Here’s a 100% generated article about AI in peer review and another on automating peer review.
Proemial is “building a platform that enables anyone to understand scientific research - and apply the insights in their daily lives; in the classroom, boardroom, lab, newsroom, or at the dining table”.
Signals has launched a free research integrity submissions evaluation platform to help restore trust in research
Keith Riegert on how he’s using ChatGPT to write the formula for turning an ISBN13 into an ISBN10 including here to remind myself that small hacks can also be incredibly helpful.

👥 Peer review tech & peer review week

Here are a couple of articles that caught my attention:

Simone Ragavooloo’s If generative AI accelerates science, peer review needs to catch up “Two things are clear. First, peer review cannot be sustained in its current state as AI increases science output. Second, as the volume of research grows, collaboration, as well as innovation, is essential to protect scientific discourse and the integrity of the scientific record.”
From Bottleneck to Breakthrough: AI’s Role in the Future of Peer Review by Zeger Karssen and Is AI the Answer to Peer Review Problems, or the Problem Itself? by Christopher Leonard
David Worlock on AI and scholarly publishing: unfashionable glimpses of hope “Is anyone under any doubt that we will create fully automated peer review systems which operate more successful than human beings?” I’m in no doubt that we will but I think we need to be clear about why we’re automating peer review and what we want our organizations to look like in the future.

What I took away from Peer Review Week was a question about the role of research communities in peer review and an uneasiness about where a technology-enhanced future might take us. This is very simplistic but imagine the following 2 scenarios:

Scenario 1: Large commercial publishers design a streamlined/optimized/one-size-fits-all peer review process which reduces costs and increases efficiency. The process is fast, rigorous and robust but there’s only one workflow. Experimentation, such as the Stacks Journal’s community peer-review process, is tricky because new services can only be added if a significant number of journals will use and sufficient cost/time savings are expected.
Scenario 2: Research communities drive the types of peer review and technologies needed to support the needs of their communities. Communities have their own quirks and ways of doing things so the reasons for automating are many and reflect the needs of each community. Experimentation flourishes, some communities no longer have journals and have moved to 100% automated writing/review. Some journals have automated screening for research integrity but remain 100% human-reviewed.

I think scenario 1 is more likely and will bring many benefits such as more rigorous research integrity checks but I also think the primary beneficiaries will be large publishers, not the research communities they serve.

✍️ AI writing/paper generation

Richard Wynne’s Quick experiment using fake data and AI to generate a fake scholarly manuscript shows how quick/easy it is to fake a paper. The experiment was inspired by a news article in the FT about how artificial intelligence is being used to help biotech researchers plan experiments and better predict outcomes.
Worth reading the comments about Ethan Mollick’s post “At OpenAI's education event last night, heard from a couple professors who were replacing purchased class readings with AI-generated (importantly, vetted/edited by the professor!) customized readings that synthesized a lot of content and were designed to better fit the syllabus”. I liked this one from Karl Moll, “I think this is a watershed moment for content/data providers: in this example, publisher's will need to start providing training data, benchmarking questions, etc etc as the commercial artifact alongside a book - that way the educator and student will be able to customize and consume the content that works best for the lesson, AND ALSO properly attributes & compensates the work of author/publisher”.

📰 Longer reads/listens/watches

The Conversations from Wiley is a good watch that covers a lot of ground. Two questions struck a chord with me:
1: Should we be worried about LLM-related tech disrupting usage-based models? I think we should, but how things might change is a much more complex question. I have a suspicion that LLM technologies, combined with changes in access control tech [a likely switch from IP access to individual logins] combined with some new kind of metric will enable the usage model to prevail/COUNTER has done a good job of adapting to change.
2: How do we handle the publication of more data than currently appears in the PDF and how do we make that data accessible? There’s a lot of really good thinking being done in this area. The content Profile/Linked Document standard would be relatively straightforward for many publishers to adopt. The work of Berend Mons and LIFES is an alternative model. I think the key issue here is not technology and standards, but more how to get the research community to adopt without this without creating a huge amount of work for researchers.
AI and the future of behavioural science. Alexandra Chesterfield, Elisabeth Costa, Professor Oliver Hauser, Dr Dario Krpan, Professor Susan Michie, and Professor Robert West talk about how AI is transforming research at an LSE event.
I enjoyed Ethan Mollick’s new book Co-intelligence about living and working with generative AI. If you read Ethan’s sucstack you’ll have read most of this before but it’s an engaging and well-written intro to GenAI and how to use it. I was staggered to learn that in 2017 it was estimated that 20,000 people in Kenya were employed as full-time academic writers to write essays and dissertations. Curious to know if GenAI has replaced these jobs.
The differing perspectives in these two podcasts discussing a demo of a service created by Harvard students that uses augmented reality glasses and AI to identify individuals and pull up extensive personal information from public databases are a good challenge. Listen to This Week in Start-ups for info on how the students built the service and the wonderful throw-away comment “...we have a lot of familiarity with are LLMs. Every college student is familiar with [ChatGPT]”. Then listen to 404 Media discuss the serious privacy issues with this kind of tech.

🚲 And finally…

I had a fantastic trip to Romania, cycling around 500km along the Black Sea coast and the Danube. I started at Constanta and rode to Tulcea, then took the ferry to Kilometer 0 of the Danube to visit the Danube Delta in Sulina. Then I made my way upstream on Eurovelo 6 through sleepy rural villages, visiting lots of archaeological sites/museums. I got as far as Zimnicea before a sudden death in the family necessitated a return home. Hopefully, I will make it to Golubac, the Iron Gates Gorge and Lepenski vir next year!

PubTech Radar Scan