Google kills the cache

Google is killing off its “cached” feature, reports Ars Technica. The feature kept a copy of a page for a few hours, days or weeks. Sometimes longer. The burden will now largely fall on permanent preservation in The Wayback Machine of the Internet Archive, making that service more vital than ever. However, that does have limitations, said to be ‘100 saves per day, per IP address’. Thus if your ISP puts you on a shared IP, you could be out of luck that day.

There’s also Archive.is, but there can be a queue 1,000 users long to archive a page. But it’s otherwise fast and also saves a screenshot. There are a few others, such as Perma.cc.

It might have been nice if Google had also bunged the Internet Archive $100m or so, to help them take up the slack, but Google seems to be a bit hard up these days. Ars Technica suggests the cache killing is a cost-saving move.

OLMo-7b

The Paul Allen Institute for AI has open-sourced its OLMo AI models for text generation. Funded by the wealth of Microsoft’s Paul Allen, the Institute runs the huge free Semantic Scholar ‘academic papers’ search harvester and database, and also has an AI arm. Its AI models are radically ‘open’ under an Apache licence, available for free-use including commercialisation. My guess would be that OLMo may be especially useful for academic text and semantics?

Added to JURN

Sehnsucht : The C.S. Lewis Journal

Christian Librarian, The

Principia : A Journal of Classical Education

Journal of the Northern Renaissance

Digital Enlightenment Studies

Bibliomanie

Aristotelica

Scripta Classica Israelica

Hieroglyphs

KIU Journal of Humanities (Uganda)

Contemporary Eurasia

Occasional Papers on Religion in Eastern Europe

Cesura (Central Europe)

Annual of Natural Sciences Department (New Bulgarian University)

Aquileia Nostra

Israel Museum Studies in Archaeology

Africa Habitat Review (African planning and the built environment)

Iluminace : The Journal of Film Theory, History, and Aesthetics (Czech)

Art Style : Art & Culture International Magazine

Gulf Coast Journal : a journal of literature and the fine arts (1982-2013) (University of Houston)

Texaco Star (1913-1963)

Shell News (1939-1959)

Phytopathology (plant diseases)

How to select just the wanted files in an Archive.org torrent

How to select the wanted files for a Librivox public-domain audiobook, when downloading via an Archive.org torrent. I’m using the popular free qBitTorrent software.

It’s a bit tricky to get just a certain set of files downloading and not the others. This is how it can be done:

1. Download the .torrent file.

2. Start the entire torrent. In the Content panel, immediately deselect all the torrent’s files.

3. Now filter the file set for “128kb.mp3” or whatever other standard naming you have in the set for the highest-quality audio files.

4. Shift-select all these filtered files, so that they’re highlighted. Don’t attempt to tick them all (there may be hundreds). Instead right-click them as a selected block and set them to priority “Normal”. qBitTorrent will now consider the files “ticked” and active.

5. Start the torrent. You should see that only the desired files are downloading.

Download CSV from any HTML table

I could have used this one the other day. Now it exists. A new Userscript to Download CSV from any table on a website. The code looks clean to me, and it works.

Especially useful for re-sorting ‘non re-sortable tables’. Though sadly not working with Github file lists.

You may need to stop it running on some sites, by adding this code to the header.

// @exclude https://www.etools.ch/

Alternatively, just disable it any only turn it on when needed.

Note that the paid Windows utility ABBBY Screenshot Reader can also OCR a table and save it as a .CSV file. Possibly useful for those times when the table is a graphic.

Llamafile – runs LLM AIs as a single .exe

The Mozilla organisation have released a way to make Large Language AI’s (think ‘ChatGPT’) into normal single-file .EXE files that will run on Windows, Mac, Linux, etc. Llamafile is open source and available now. Sample .EXE files include WizardCoder-Python-13B. Sadly though it won’t run on Windows, as that OS has a 4Gb limit on the size of .EXE files. So near, yet so far.