A new free book from a UCLA historian, Athena Unbound: Why and How Scholarly Knowledge Should Be Free for All. Partly a history (no mention of JURN, though), and partly another stab at ‘how to make OA work’ in the future. There’s also a podcast interview with the author, albeit revealing some rather interesting assumptions. Such as…
“ChatGPT as I understand it at the moment scrapes and feeds off of the crappy end of the Web … I don’t think it’s able to get past the paywalls and into the scholarly databases and into the journals, as far as I know. So insofar as that’s true, then all we’re getting is a garbage-in, garbage-out product from ChatGPT … good ChatGPT should be based on the stuff that right now the paywalls keep us out of.”
The idea that worthy content is only to be found behind a paywall will raise an eyebrow among many OA publishers and indexers. He also makes the even more questionable assumption that piracy no longer exists in non-academic content (movies, games, TV, software, comics, instructional videos etc). But those assumptions aside, his core points are thought provoking…
i) It certainly would be interesting if an AI could be trained purely on a critical mass of non-science / non-medical academic journal texts. On say… Sci-Hub’s PDFs, Semantic Scholar’s PDFs (which I’m assuming subsumes the DOAJ’s relatively small PDF holdings), and perhaps even all the PDFs that could theoretically be harvested after spidering JURN’s index URLs. So far as I’m aware, in the admittedly blisteringly fast development of AIs, there’s nothing like that just yet. Neither of those three give complete coverage of course. But even in a partial early form such an AI would be interesting to have.
ii) He also raises the question of copyright in the output of such journal-ingesting AIs. If the pure unaltered text product of an AI cannot be copyrighted, he suggest that many will come to prefer the AI’s potted answers over struggling with the actual (paid) articles from which it was hashed. I’d add that what they won’t prefer to do, most likely, is then to laboriously hand-check the AI’s factual claims, logic, references, etc that may trip them up in a follow-on use of the text. Also the errors of taste and historical knowledge that will likely occur with scholarly arts/humanities AIs, such as we already see in dumb taste-matching software on store sites — for instance assuming that Ziggy-era Bowie is the same as Eno-era Bowie and Tin Machine-era Bowie, or that if you like The Hobbit you will also enjoy The Silmarillion.
That said, Elon Musk and others are already reported to be working on fact-checking and check-able ‘citation finding’ AIs. Daisy-chained workflows between very different AIs will likely emerge, and doubtless there will even be AIs which can suggest and optimise such daisy-chains. Part of such chains will likely be AI modules which try to strip out “AI-ness” and also steganographic watermarking and suchlike, and attempt to add “human-ness” to the look and feel of the sale-able end product. Perhaps even filters for glaring “errors of taste” in matters relating to art and literature.