Llamafile – runs LLM AIs as a single .exe

The Mozilla organisation have released a way to make Large Language AI’s (think ‘ChatGPT’) into normal single-file .EXE files that will run on Windows, Mac, Linux, etc. Llamafile is open source and available now. Sample .EXE files include WizardCoder-Python-13B. Sadly though it won’t run on Windows, as that OS has a 4Gb limit on the size of .EXE files. So near, yet so far.

Blog moved and fixed

Ok, well… I think ‘the blog move’ is mostly done. I’ve no idea what happened with the old WordPress.com site. If they won’t tell me what was censorable, and also give me access to fix it, then I guess that version of the blog is gone. Oh well, that’s what backups are for.

JURN’s blog now has a new URL at: https://jurn.link/jurnsearch/ and is restored here. You can make up your own mind if it deserved WordPress.com’s abrupt and total blocking.

The only change you’re likely to see is that the former sidebar URLs are now on their own page.

The new blog is also now linked from the main JURN search tool home-page.

Internal URLs have been corrected via Regex search-replace, including my other blogs. A few of the images on old posts may be broken. But I had a recent haul of images backed up via a Linkbot, in the same WordPress folder-structure, and thus most images should be found when you visit an old post.

URL references have been subject to a Regex search-replace, including on some of my other blogs. My example/demo .XLS files have also been restored here at the JURN blog. There may be a few broken .PDF links, but I hardly upload those… so finding and fixing them can wait.

The new RSS feed is: https://jurn.link/jurnsearch/index.php/feed/

URL to .torrent

The very worthy Torrent Webseed Creator project at GitHub. Use your free Google Colab space to turn any file freely available on the open Web, up to 100Gb, from the plain URL into a handy .torrent file. Which is then uploaded to Cloudflare for you. You then download from there as a .torrent using a standard torrrenting software. Tested and working for me. The 32Gb test file was uploaded to Cloudflare Canada, from which I torrented it at my leisure.

Useful for those who have to download a huge non-resuming multi-Gb file with the browser, and are repeatedly failing to do so. Note the source files must be freely available without cookies or similar fuss.

Microsoft’s AI voices and Project Gutenberg

Microsoft has used its advanced AI text-to-speech voices to produce free audiobooks of all 4,840 Project Gutenberg books.

There’s now a handy single-page Browse list, but it’s A-Z by title rather than by author. So I made a quick Excel .XLS ordered A-Z by author, with the clickable hyperlink retained on each title. The links are to .MP3 files, and of course to get them your Excel needs to be allowed online.

.XLS for Excel (as PDF): (sadly lost on the blog move)


Tutorial:

There are times when Excel fails to automatically paste into two columns. So here’s how to sort a list such as this ‘by every nth line’, using the handy Kutools plug-in for Excel. No ‘wrestling with formulas’ is needed here…

1. Highlight and Crtl + C copy the list from the source. The data must be clean, meaning strictly line 1 then the matching line 2, consistently all the way down.

2. Paste the list to the first Excel column.

3. Top tabs bar | Kutools tab | Range | Transform Range.

4. The “Data to be transformed” box is automatically filled in for you. You just need to type in the “Fixed Value”. Here we want “2”. Then check the output demo looks correct. Press OK.

5. You’re taken to the next box. Don’t type anything. Instead you just click on the first cell of your target column. I chose “C”. The “output range” box will then be auto-filled with a bit of Excel’s magic gibberish. Press OK.

6. Processing will then start. Let it run, for a big list. It can happily process over 9,000 lines.

7. All done. The list has been separated into two columns. By extracting every line, then the following line, and so on down the column list.

Drop it

Work lost per week, due to Internet connection line-drops:

6 minutes per drop, at four per day = 24 minutes a day = 2.8 hours working time lost per week. Possibly more if you have to crawl under a desk to reboot the router, and thus also have to go wash your hands each time.

WordPress.com and keyword search of a free blog

Oh dear. WordPress.com appears to have started doing annoying ‘second guessing’ on keyword search in a free blog.

Searched for aviation. Wasted time with results that only have airport or aircraft in them… and no sign of the aviation keyword required.

Searching for “aviation” makes no difference. And it’s not distinguishing between capital-A Aviation and aviation either.

The best option (for those with huge blogs, rather than those who struggle to produce six posts a year) now seems to be to download your .XML archive, and then have that in a folder indexed by desktop search software (DocFetcher etc). The problem there is that the post you’re looking for is likely a recent one, and may not be on the older local archive.

On linkrot

A new study of linkrot in Digital Humanities Quarterly, “Reference Rot in the Digital Humanities Literature”.

“[in the DHQ sample] over a quarter of sampled citations are links to websites. Over 30% of these references are [now] inaccessible or have additional access barriers.”

Perhaps we need a copyright-busting AI for this? Imagine that, with ‘one press of a button’, a ref-bot AI goes and visits/reads the reference links at the time of the article’s publication, and thus produces a unified set of summaries. Perhaps with each summary weighted towards topics being discussed in the paragraph before the point-of-citation. The result would then be offered alongside the published article, as an appendix. Since AI-made text cannot be in copyright, the publishers’s lawyers would presumably not swoon at such an idea. Of course, the author would then have to fact-check and human-approve it as correct. But that should not be to onerous.

Hiding all Amazon results containing a keyword

How to hide all Amazon search results containing the word Bluetooth.

Why use this:

i) Let’s say you are searching for wireless headphones. You want headphones with a proper radio-frequency wireless base-station that uses a rock-solid 100-yard range, and not those that use the infernal and unreliable Bluetooth system. You thus want to remove the vast number of Bluetooth headphones from your search results. But Amazon’s filtering system won’t allow you to do that.

ii) Or perhaps you simply want to remove all results with a title containing your own chosen keyword. Again, this assumes that Amazon lacks the required sidebar filtering, and that you have hundreds of results to manually trawl through. In which case, just change the keyword used below.

Required:

Use this simple code with the popular free Web browser add-on “uBlock Origin”, by adding it to uBlock’s filter list. Simply paste the code to the list and save.

! Hide all search results on Amazon which contain bluetooth in the title
amazon.co.uk##[data-component-type=”s-search-result”]:has-text(/bluetooth/i)

Of course you should also change amazon.co.uk to whatever your usual national Amazon store is, if you’re not in the UK.

You should not find it also interfering with your Wishlist pages, but if you do then whitelist in uBlock’s ‘Trusted Sites’ thus…

www.amazon.co.uk/hz/wishlist/ls/*

Thanks to RraaLL for suggesting an improvement to my initial way of doing it. Post updated.

Block by keyword with uBlock

Google Search is now adding “People also searched for” pop-down panels, placed under individual search results. These often appear on using the back button to go back to a page of former results.

I don’t want any kind of ‘pops’ in my search-results. Block them all in your uBlock Origin filter list, by adding this filter…

The above is also a working demo of how to use an xpath command to block any keyword inside a DIV’s ID. In this case the filter blocks all HTML DIVs with an internal ID containing the letters “eob”. This blocking is not constrained to just these letters, meaning that the command will also block “eob77” or “eob_34”, without the need for a wildcard * symbol. This is required for Google Search, as all the “eob” instances have a number after them.

Another example would be to block all ‘Save’ pop-overs on Bing Images…