• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Author Archives: futurilla

CSS and desist

11 Monday May 2020

Posted by futurilla in JURN tips and tricks

≈ 1 Comment

Ever wonder what a Web page would look like if just the plain HTML were shown, as if it were 1996 again? Ah, but which HTML? There are actually two forms of HTML that reach your browser, and there can be quite a disparity between these two types.

The first is what you see after the raw HTML has been pummelled about by CSS and javascript and the browser’s interpretation. This is often referred to as the ‘DOM’ HTML. This code is what you see and navigate through if you ‘Inspect element’ in your browser, or if you block an element with uBlock’s Element Picker tool.

The second type is the HTML code that gets sent to the browser in the first place, and that original is kept pristine and effectively ‘under’ the Web page. It can be seen via: right-click on page / ‘View source’. This source can then be selected and copied with a Ctrl + A / Ctrl + C keyboard command. Or it can be saved out when you ‘Save page as…’ / ‘Save as HTML only’, and from there you can re-open the saved page in the browser. Some remote CSS, javascript and images may still be called, even then.

A quicker way to ‘see’ this original without its CSS and other ‘remote-code’ flibbertigibbets is to install the add-on disable-HTML in your browser…

The addon is quite simple to use, and though old still works fine. It was somewhat mis-named, as it can robustly block everything except the HTML of the actual ‘page source’. With CSS and javascript blocked, it appears to be blocking the DOM version of the HTML from emerging from the page source. So what you see displayed, on page re-load, is effectively the page source. As such it can be quite handy for the removal of some types of especially tough and obstreperous CSS-and-javascript -driven overlays, in a situation where you don’t much care about the fancy wrapping and just want the words in a readable and/or copy-able form. Such as on the vile overlays of the unherd.com site.

Regex 2020

11 Monday May 2020

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

I’ve expanded my free PDF file, “Some useful regex commands for Notepad++”, which had been released here in May 2019. Here’s the new one…


Update: now updated again My Little Regex Cookbook, for Notepad++ (September 2020).

AnswerThePublic.com

06 Wednesday May 2020

Posted by futurilla in Spotted in the news

≈ Leave a comment

AnswerThePublic.com is an interesting new search tool. Instead of searching for answers, it tries to pick up the questions being asked about products. It promises “a direct line to your customers’ thoughts”, or at least those customer-users who are savvy enough and non-expert enough to pose a well-formed question in the right place.

The searcher seems to be limited to three searches per day, after which your time is up and you’re shown this slice of cheese…

The results format is quite elegantly graphical and useful, though I can’t screenshot and discuss these here because… I’ve had my three searches. There was some kind of linkback to Google Search on some of the results tabs, which seemed to make it even more useful.

New from Google Research

05 Tuesday May 2020

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

Google Research has launched COVID-19 Research Explorer. This has “a semantic search interface” that enables better search and discovery across “more than 50,000 journal articles and preprints”.

.JSON to .CSV with Windows freeware

02 Saturday May 2020

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

Situation: You need to cleanly extract just the usernames from a .JSON file, and place each name on a new line. The result should look like…

name1
name2
name3

The value being extracted from the JSON could just as easily be email addresses, or map co-ordinates, or suchlike.

Why you might do this: You can’t just do a simple “.JSON out, .JSON in”. For instance, let’s say you have a Web browser add-on that offers a blocklist function based on usernames. Perhaps it’s the deviantART-Filter. You want to port this 1,600 name blocklist of scum-and-villainy over to a similar browser add-on. Perhaps it’s the dA_ignore UserScript. The old deviantART-Filter usefully exports a .JSON file of your blocked users. But…. the new dA_ignore can only import a pasted-in list of usernames, one per line.

Solution 1:

There is a working Regex for doing this, which only requires a recent copy of Notepad++ and a suitable .JSON file for the process…

It’s been tested and works. The use of the * wildcard will enable the extraction of a list in the form…

path_label”:”any_value_here

After this regex has processed the code you are thus left with a list that looks like…

username”:”name1
username”:”name2
username”:”name3

… and you need only to do a basic search/replace to clear username”:” to obtained your cleaned list of the various unique usernames.

In this use-case you go to your own DeviantArt Settings page, and paste in the new ported-over blocklist.


Solution 2:

1. Open the .JSON with the genuine Windows freeware JSONedit.

2. Filter the JSON on field “username” by typing username into the search box. In ‘List view’ you should now only see a list of the “username” fields and the adjacent data entry.

3. Then go: top menu | ‘Tools’ | ‘Export as .CSV’.

4. Add a file extension .CSV to the saved file if needed, and then open it with Excel. Select and copy out all the “usernames” column to a new Notepad++ file.

5. In Notepad++, a quick search-replace will then remove the “” marks. You now have a clean portable list, with one username per line.


Solution 3:

There are, of course, cloud services in Whereizitagain that may well offer to do this. But the above methods use Windows freeware and are thus more secure.

.org saved, for now

01 Friday May 2020

Posted by futurilla in Spotted in the news

≈ Leave a comment

“SaveDotOrg campaign succeeds, as ICANN rejects .ORG sale”. Excellent news for .org site owners. A sale could have led to a scenario in which a new owner would have been able to loot and pillage .org, by drastically hiking up everyone’s registration fees. That’s off the cards for the moment, but The Internet Society still wants to find a new “faithful owner” in due course. One who can offer “protection against censorship and financial exploitation” for .org sites.

UK scraps 20% sales tax early, for e-journals

30 Thursday Apr 2020

Posted by futurilla in Spotted in the news

≈ Leave a comment

The UK government has just announced that “Plans to scrap VAT on e-publications have been fast-tracked, and will come into force tomorrow”. VAT is the UK’s main sales tax. This should mean cheaper lockdown e-reading and research — so long as publishers and Amazon don’t just keep prices the same and pocket the 20% as extra profit. The move covers “e-books, e-newspapers, e-magazines and academic e-journals”, but seemingly not audiobooks. The change will be permanent, and had been scheduled for December 2020.

The UK government will also spend £35 million in taking out ‘public education’ print ads in newspapers, over the next three months. This will be “split between local, regional and national print media”, with what appears to be a strong tilt toward what the government calls the “most-trusted” print newspapers. This may imply that the shoddy, slipshod and alarmist reporting we’ve seen could be about to have financial consequences for newspapers.

Dumb devices

25 Saturday Apr 2020

Posted by futurilla in Ooops!

≈ Leave a comment

Oh dear, it’s 2020 and the biggest and most AI-powered services on the planet are still relying on dumb keyword-blocking. AbeBooks reports that the pulp sci-fi double-bill paperback Mask of Chaos/The Star Virus has been classed by Amazon as a “medical device” and banned from sale.

Ironically, Amazon is still listing bat faeces sent from China, delivered to your door here in the UK. Apparently medical pseudo-science believes it to be a remedy for poor eyesight.

Added to JURN

25 Saturday Apr 2020

Posted by futurilla in New titles added to JURN

≈ Leave a comment

Acta Baltica Historiae et Philosophiae Scientiarum (history of science).

Sound Stage Screen (sound and musicology of stage and cinema).

Journal of Juvenilia Studies (history of literary juvenilia, also amateur literary publishing by juveniles).

History of Classical Scholarship and the HCS Supplementary Volumes open book series.

Journal of Ancient Egyptian Interconnections (two-year paywall, March 2018 is the latest open issue).

Praticas da Historia : Journal on Theory, Historiography and Uses of the Past (Portugal, about 30-40% of each issue is in English).

Knowledge Organization (JURN indexes open articles from 1974-2016, although this entails bringing in four years of paywall articles to 2020. The options were i) ‘all to 1999’ and no paywalled articles, or ii) ‘all to 2020’ and annoy searchers by including four years of paywalled articles. It was decided that it was worth some annoyance to get an extra 16 years of open content into JURN).


HortTechnology (horticultural technologies, with substantial cross-over to native plants, tackling invasive disease, and topics such as public education and landscape use)

On auto-downloading open access books

21 Tuesday Apr 2020

Posted by futurilla in How to improve academic search, My general observations, Spotted in the news

≈ Leave a comment

Martin Paul Eve has a new post on Zotero and auto-downloading open access books…

all I really wanted was to be able to embed an ISBN and a citation_pdf_url and have Zotero do the lookup and save the file. However, out of the box there is no easy way to do this.

His test book is quite interesting, his own new Close Reading with Computers: Textual Scholarship, Computational Formalism, and David Mitchell’s Cloud Atlas (April 2020), which applies textual computing to the science-fiction-philosophy novel Cloud Atlas.

I don’t know about or use the current version of Zotero, so I’m unsure what advantages it confers. I assume Eve intended to find a way to automatically harvest all CC-SA books in PDF, and build a local collection for automated analysis.

But I see his book is already on the OA book aggregator catalogue OAPEN. Theoretically then, since OAPEN is comprehensive and timely, one could have a harvester look at all the pages hanging off library.oapen.org/handle/ and save out only those pages with the required permissive CC “Rights” label on them. These pages each have a uniform PDF link URL in their HTML, in the form of library.oapen.org/bitstream/ and these could be easily extracted to a list. One would end up with a set of PDF links for a linkbot, ready to download to a local folder for computational analysis. I presume that’s what Eve intended to have Zotero do.

One would need to reference the OAPEN record page first, in the way I’ve suggested, since the PDF itself can have different or non-uniform or contradictory licence information. For instance in its interior Eve’s book is labelled as both “©” … “No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or in any information storage or retrieval system without the prior written permission of Stanford University Press.” and also “Creative Commons Attribution-ShareAlike 4.0”.

How many items on OAPEN have a creativecommons.org/licenses/by-sa/ “Rights” label at present, as Martin’s book does? A Google site: search suggests around 650 titles. Half an hour of my filtering the OAPEN CSV suggests it’s actually just over 3,000 under some form of permissive CC that permits commercial use. That’s still a manageable harvest at present. But as the supply of OA books and monographs grows rapidly, the likely result of various OA mandates in the near-future, it might be a useful time-saver for text-miners and digital humanists if OAPEN were to maintain a single torrent of all the PDFs. Inside which a half dozen folders would neatly organise the books by CC licence type. Such a one-click solution might save a lot of faffing around with digging into and filtering their XML and CSV feeds, wrangling with harvester scripts and timeouts, or trying to wrestle with third-party services such as Zotero. A torrent could also save OAPEN’s bandwidth.

← Older posts
Newer posts →
RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • February 2026
    • January 2026
    • October 2025
    • May 2025
    • April 2025
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.