Instant AI audio transcription, in a $300 box

21 Saturday Oct 2023

Posted by futurilla in Spotted in the news

Instant AI audio transcription / translation, in a small offline box which can can “translate 15 different languages” in real-time. For $300, fully private and offline, no corporate subscription required. Crowdfunding now, with the back end freely available under an open licence at github.com/usefulsensors/useful-transformers.

Microsoft’s AI voices and Project Gutenberg

20 Wednesday Sep 2023

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

Microsoft has used its advanced AI text-to-speech voices to produce free audiobooks of all 4,840 Project Gutenberg books.

There’s now a handy single-page Browse list, but it’s A-Z by title rather than by author. So I made a quick Excel .XLS ordered A-Z by author, with the clickable hyperlink retained on each title. The links are to .MP3 files, and of course to get them your Excel needs to be allowed online.

.XLS for Excel (as PDF): (sadly lost on the blog move)

Tutorial:

There are times when Excel fails to automatically paste into two columns. So here’s how to sort a list such as this ‘by every nth line’, using the handy Kutools plug-in for Excel. No ‘wrestling with formulas’ is needed here…

1. Highlight and Crtl + C copy the list from the source. The data must be clean, meaning strictly line 1 then the matching line 2, consistently all the way down.

2. Paste the list to the first Excel column.

3. Top tabs bar | Kutools tab | Range | Transform Range.

4. The “Data to be transformed” box is automatically filled in for you. You just need to type in the “Fixed Value”. Here we want “2”. Then check the output demo looks correct. Press OK.

5. You’re taken to the next box. Don’t type anything. Instead you just click on the first cell of your target column. I chose “C”. The “output range” box will then be auto-filled with a bit of Excel’s magic gibberish. Press OK.

6. Processing will then start. Let it run, for a big list. It can happily process over 9,000 lines.

7. All done. The list has been separated into two columns. By extracting every line, then the following line, and so on down the column list.

Drop it

10 Saturday Jun 2023

Posted by futurilla in Ooops!

≈ Leave a comment

Work lost per week, due to Internet connection line-drops:

6 minutes per drop, at four per day = 24 minutes a day = 2.8 hours working time lost per week. Possibly more if you have to crawl under a desk to reboot the router, and thus also have to go wash your hands each time.

WordPress.com and keyword search of a free blog

08 Thursday Jun 2023

Posted by futurilla in My general observations

≈ Leave a comment

Oh dear. WordPress.com appears to have started doing annoying ‘second guessing’ on keyword search in a free blog.

Searched for aviation. Wasted time with results that only have airport or aircraft in them… and no sign of the aviation keyword required.

Searching for “aviation” makes no difference. And it’s not distinguishing between capital-A Aviation and aviation either.

The best option (for those with huge blogs, rather than those who struggle to produce six posts a year) now seems to be to download your .XML archive, and then have that in a folder indexed by desktop search software (DocFetcher etc). The problem there is that the post you’re looking for is likely a recent one, and may not be on the older local archive.

On linkrot

02 Friday Jun 2023

Posted by futurilla in Academic search

≈ Leave a comment

A new study of linkrot in Digital Humanities Quarterly, “Reference Rot in the Digital Humanities Literature”.

“[in the DHQ sample] over a quarter of sampled citations are links to websites. Over 30% of these references are [now] inaccessible or have additional access barriers.”

Perhaps we need a copyright-busting AI for this? Imagine that, with ‘one press of a button’, a ref-bot AI goes and visits/reads the reference links at the time of the article’s publication, and thus produces a unified set of summaries. Perhaps with each summary weighted towards topics being discussed in the paragraph before the point-of-citation. The result would then be offered alongside the published article, as an appendix. Since AI-made text cannot be in copyright, the publishers’s lawyers would presumably not swoon at such an idea. Of course, the author would then have to fact-check and human-approve it as correct. But that should not be to onerous.

Hiding all Amazon results containing a keyword

27 Saturday May 2023

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

How to hide all Amazon search results containing the word Bluetooth.

Why use this:

i) Let’s say you are searching for wireless headphones. You want headphones with a proper radio-frequency wireless base-station that uses a rock-solid 100-yard range, and not those that use the infernal and unreliable Bluetooth system. You thus want to remove the vast number of Bluetooth headphones from your search results. But Amazon’s filtering system won’t allow you to do that.

ii) Or perhaps you simply want to remove all results with a title containing your own chosen keyword. Again, this assumes that Amazon lacks the required sidebar filtering, and that you have hundreds of results to manually trawl through. In which case, just change the keyword used below.

Required:

Use this simple code with the popular free Web browser add-on “uBlock Origin”, by adding it to uBlock’s filter list. Simply paste the code to the list and save.

! Hide all search results on Amazon which contain bluetooth in the title
amazon.co.uk##[data-component-type=”s-search-result”]:has-text(/bluetooth/i)

Of course you should also change amazon.co.uk to whatever your usual national Amazon store is, if you’re not in the UK.

You should not find it also interfering with your Wishlist pages, but if you do then whitelist in uBlock’s ‘Trusted Sites’ thus…

www.amazon.co.uk/hz/wishlist/ls/*

Thanks to RraaLL for suggesting an improvement to my initial way of doing it. Post updated.

Block by keyword with uBlock

27 Saturday May 2023

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

Google Search is now adding “People also searched for” pop-down panels, placed under individual search results. These often appear on using the back button to go back to a page of former results.

I don’t want any kind of ‘pops’ in my search-results. Block them all in your uBlock Origin filter list, by adding this filter…

The above is also a working demo of how to use an xpath command to block any keyword inside a DIV’s ID. In this case the filter blocks all HTML DIVs with an internal ID containing the letters “eob”. This blocking is not constrained to just these letters, meaning that the command will also block “eob77” or “eob_34”, without the need for a wildcard * symbol. This is required for Google Search, as all the “eob” instances have a number after them.

Another example would be to block all ‘Save’ pop-overs on Bing Images…

New book: Athena Unbound

25 Thursday May 2023

Posted by futurilla in Spotted in the news

≈ Leave a comment

A new free book from a UCLA historian, Athena Unbound: Why and How Scholarly Knowledge Should Be Free for All. Partly a history (no mention of JURN, though), and partly another stab at ‘how to make OA work’ in the future. There’s also a podcast interview with the author, albeit revealing some rather interesting assumptions. Such as…

“ChatGPT as I understand it at the moment scrapes and feeds off of the crappy end of the Web … I don’t think it’s able to get past the paywalls and into the scholarly databases and into the journals, as far as I know. So insofar as that’s true, then all we’re getting is a garbage-in, garbage-out product from ChatGPT … good ChatGPT should be based on the stuff that right now the paywalls keep us out of.”

The idea that worthy content is only to be found behind a paywall will raise an eyebrow among many OA publishers and indexers. He also makes the even more questionable assumption that piracy no longer exists in non-academic content (movies, games, TV, software, comics, instructional videos etc). But those assumptions aside, his core points are thought provoking…

i) It certainly would be interesting if an AI could be trained purely on a critical mass of non-science / non-medical academic journal texts. On say… Sci-Hub’s PDFs, Semantic Scholar’s PDFs (which I’m assuming subsumes the DOAJ’s relatively small PDF holdings), and perhaps even all the PDFs that could theoretically be harvested after spidering JURN’s index URLs. So far as I’m aware, in the admittedly blisteringly fast development of AIs, there’s nothing like that just yet. Neither of those three give complete coverage of course. But even in a partial early form such an AI would be interesting to have.

ii) He also raises the question of copyright in the output of such journal-ingesting AIs. If the pure unaltered text product of an AI cannot be copyrighted, he suggest that many will come to prefer the AI’s potted answers over struggling with the actual (paid) articles from which it was hashed. I’d add that what they won’t prefer to do, most likely, is then to laboriously hand-check the AI’s factual claims, logic, references, etc that may trip them up in a follow-on use of the text. Also the errors of taste and historical knowledge that will likely occur with scholarly arts/humanities AIs, such as we already see in dumb taste-matching software on store sites — for instance assuming that Ziggy-era Bowie is the same as Eno-era Bowie and Tin Machine-era Bowie, or that if you like The Hobbit you will also enjoy The Silmarillion.

That said, Elon Musk and others are already reported to be working on fact-checking and check-able ‘citation finding’ AIs. Daisy-chained workflows between very different AIs will likely emerge, and doubtless there will even be AIs which can suggest and optimise such daisy-chains. Part of such chains will likely be AI modules which try to strip out “AI-ness” and also steganographic watermarking and suchlike, and attempt to add “human-ness” to the look and feel of the sale-able end product. Perhaps even filters for glaring “errors of taste” in matters relating to art and literature.

Release: PDF Index Generator 3.3

24 Wednesday May 2023

Posted by futurilla in Spotted in the news

≈ Leave a comment

A new version of PDF Index Generator, the first in a year. It’s the best standalone desktop software for making back-of-the-book indexes from finished PDFs. New in version 3.3 (May 2023), among other changes…

* Can now run-in the sub-headers (rather than doing list-style sub-headings). Video.

* Multi-page indexing (e.g. 265-278) can now be truncated (as 265-78). Video.

* Even/odd pages can now have their own margins set. Video.

* “Added an Include query … to index capitalized words”.

* Database files are now much smaller.

* “Fixed footnotes as it was showing footnote number & normal page number too!”

That last one is especially important for footnoting scholars. The footnotes feature was introduced in 2.9 (February 2020).

The software is still working all the way back to Windows XP, and is still the same price.