How to select just the wanted files in an Archive.org torrent

How to select the wanted files for a Librivox public-domain audiobook, when downloading via an Archive.org torrent. I’m using the popular free qBitTorrent software.

It’s a bit tricky to get just a certain set of files downloading and not the others. This is how it can be done:

1. Download the .torrent file.

2. Start the entire torrent. In the Content panel, immediately deselect all the torrent’s files.

3. Now filter the file set for “128kb.mp3” or whatever other standard naming you have in the set for the highest-quality audio files.

4. Shift-select all these filtered files, so that they’re highlighted. Don’t attempt to tick them all (there may be hundreds). Instead right-click them as a selected block and set them to priority “Normal”. qBitTorrent will now consider the files “ticked” and active.

5. Start the torrent. You should see that only the desired files are downloading.

Download CSV from any HTML table

I could have used this one the other day. Now it exists. A new Userscript to Download CSV from any table on a website. The code looks clean to me, and it works.

Especially useful for re-sorting ‘non re-sortable tables’. Though sadly not working with Github file lists.

You may need to stop it running on some sites, by adding this code to the header.

// @exclude https://www.etools.ch/

Alternatively, just disable it any only turn it on when needed.

Note that the paid Windows utility ABBBY Screenshot Reader can also OCR a table and save it as a .CSV file. Possibly useful for those times when the table is a graphic.

Llamafile – runs LLM AIs as a single .exe

The Mozilla organisation have released a way to make Large Language AI’s (think ‘ChatGPT’) into normal single-file .EXE files that will run on Windows, Mac, Linux, etc. Llamafile is open source and available now. Sample .EXE files include WizardCoder-Python-13B. Sadly though it won’t run on Windows, as that OS has a 4Gb limit on the size of .EXE files. So near, yet so far.

Blog moved and fixed

Ok, well… I think ‘the blog move’ is mostly done. I’ve no idea what happened with the old WordPress.com site. If they won’t tell me what was censorable, and also give me access to fix it, then I guess that version of the blog is gone. Oh well, that’s what backups are for.

JURN’s blog now has a new URL at: https://jurn.link/jurnsearch/ and is restored here. You can make up your own mind if it deserved WordPress.com’s abrupt and total blocking.

The only change you’re likely to see is that the former sidebar URLs are now on their own page.

The new blog is also now linked from the main JURN search tool home-page.

Internal URLs have been corrected via Regex search-replace, including my other blogs. A few of the images on old posts may be broken. But I had a recent haul of images backed up via a Linkbot, in the same WordPress folder-structure, and thus most images should be found when you visit an old post.

URL references have been subject to a Regex search-replace, including on some of my other blogs. My example/demo .XLS files have also been restored here at the JURN blog. There may be a few broken .PDF links, but I hardly upload those… so finding and fixing them can wait.

The new RSS feed is: https://jurn.link/jurnsearch/index.php/feed/

URL to .torrent

The very worthy Torrent Webseed Creator project at GitHub. Use your free Google Colab space to turn any file freely available on the open Web, up to 100Gb, from the plain URL into a handy .torrent file. Which is then uploaded to Cloudflare for you. You then download from there as a .torrent using a standard torrrenting software. Tested and working for me. The 32Gb test file was uploaded to Cloudflare Canada, from which I torrented it at my leisure.

Useful for those who have to download a huge non-resuming multi-Gb file with the browser, and are repeatedly failing to do so. Note the source files must be freely available without cookies or similar fuss.

Microsoft’s AI voices and Project Gutenberg

Microsoft has used its advanced AI text-to-speech voices to produce free audiobooks of all 4,840 Project Gutenberg books.

There’s now a handy single-page Browse list, but it’s A-Z by title rather than by author. So I made a quick Excel .XLS ordered A-Z by author, with the clickable hyperlink retained on each title. The links are to .MP3 files, and of course to get them your Excel needs to be allowed online.

.XLS for Excel (as PDF): (sadly lost on the blog move)


Tutorial:

There are times when Excel fails to automatically paste into two columns. So here’s how to sort a list such as this ‘by every nth line’, using the handy Kutools plug-in for Excel. No ‘wrestling with formulas’ is needed here…

1. Highlight and Crtl + C copy the list from the source. The data must be clean, meaning strictly line 1 then the matching line 2, consistently all the way down.

2. Paste the list to the first Excel column.

3. Top tabs bar | Kutools tab | Range | Transform Range.

4. The “Data to be transformed” box is automatically filled in for you. You just need to type in the “Fixed Value”. Here we want “2”. Then check the output demo looks correct. Press OK.

5. You’re taken to the next box. Don’t type anything. Instead you just click on the first cell of your target column. I chose “C”. The “output range” box will then be auto-filled with a bit of Excel’s magic gibberish. Press OK.

6. Processing will then start. Let it run, for a big list. It can happily process over 9,000 lines.

7. All done. The list has been separated into two columns. By extracting every line, then the following line, and so on down the column list.

Drop it

Work lost per week, due to Internet connection line-drops:

6 minutes per drop, at four per day = 24 minutes a day = 2.8 hours working time lost per week. Possibly more if you have to crawl under a desk to reboot the router, and thus also have to go wash your hands each time.

WordPress.com and keyword search of a free blog

Oh dear. WordPress.com appears to have started doing annoying ‘second guessing’ on keyword search in a free blog.

Searched for aviation. Wasted time with results that only have airport or aircraft in them… and no sign of the aviation keyword required.

Searching for “aviation” makes no difference. And it’s not distinguishing between capital-A Aviation and aviation either.

The best option (for those with huge blogs, rather than those who struggle to produce six posts a year) now seems to be to download your .XML archive, and then have that in a folder indexed by desktop search software (DocFetcher etc). The problem there is that the post you’re looking for is likely a recent one, and may not be on the older local archive.

On linkrot

A new study of linkrot in Digital Humanities Quarterly, “Reference Rot in the Digital Humanities Literature”.

“[in the DHQ sample] over a quarter of sampled citations are links to websites. Over 30% of these references are [now] inaccessible or have additional access barriers.”

Perhaps we need a copyright-busting AI for this? Imagine that, with ‘one press of a button’, a ref-bot AI goes and visits/reads the reference links at the time of the article’s publication, and thus produces a unified set of summaries. Perhaps with each summary weighted towards topics being discussed in the paragraph before the point-of-citation. The result would then be offered alongside the published article, as an appendix. Since AI-made text cannot be in copyright, the publishers’s lawyers would presumably not swoon at such an idea. Of course, the author would then have to fact-check and human-approve it as correct. But that should not be to onerous.