Martin Paul Eve has a new post on Zotero and auto-downloading open access books

all I really wanted was to be able to embed an ISBN and a citation_pdf_url and have Zotero do the lookup and save the file. However, out of the box there is no easy way to do this.

His test book is quite interesting, his own new Close Reading with Computers: Textual Scholarship, Computational Formalism, and David Mitchell’s Cloud Atlas (April 2020), which applies textual computing to the science-fiction-philosophy novel Cloud Atlas.

I don’t know about or use the current version of Zotero, so I’m unsure what advantages it confers. I assume Eve intended to find a way to automatically harvest all CC-SA books in PDF, and build a local collection for automated analysis.

But I see his book is already on the OA book aggregator catalogue OAPEN. Theoretically then, since OAPEN is comprehensive and timely, one could have a harvester look at all the pages hanging off library.oapen.org/handle/ and save out only those pages with the required permissive CC “Rights” label on them. These pages each have a uniform PDF link URL in their HTML, in the form of library.oapen.org/bitstream/ and these could be easily extracted to a list. One would end up with a set of PDF links for a linkbot, ready to download to a local folder for computational analysis. I presume that’s what Eve intended to have Zotero do.

One would need to reference the OAPEN record page first, in the way I’ve suggested, since the PDF itself can have different or non-uniform or contradictory licence information. For instance in its interior Eve’s book is labelled as both “©” … “No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or in any information storage or retrieval system without the prior written permission of Stanford University Press.” and also “Creative Commons Attribution-ShareAlike 4.0”.

How many items on OAPEN have a creativecommons.org/licenses/by-sa/ “Rights” label at present, as Martin’s book does? A Google site: search suggests around 650 titles. Half an hour of my filtering the OAPEN CSV suggests it’s actually just over 3,000 under some form of permissive CC that permits commercial use. That’s still a manageable harvest at present. But as the supply of OA books and monographs grows rapidly, the likely result of various OA mandates in the near-future, it might be a useful time-saver for text-miners and digital humanists if OAPEN were to maintain a single torrent of all the PDFs. Inside which a half dozen folders would neatly organise the books by CC licence type. Such a one-click solution might save a lot of faffing around with digging into and filtering their XML and CSV feeds, wrangling with harvester scripts and timeouts, or trying to wrestle with third-party services such as Zotero. A torrent could also save OAPEN’s bandwidth.