Growing Knowledge

Growing Knowledge, a new website and set of videos from the British Library on the future of online research knowledge…

“How have digital technologies changed research? What are the new challenges they pose? What role should a research library play in the 21st Century? Growing Knowledge at the British Library explores these questions with our researchers in order to inform the debate on the future of research.”

An accompanying exhibition at the British Library runs until 11th July 2011.

Moving from Google Noise Reduction to Google Hit Hider

Firefox 4 final is out now. Sadly it breaks the Greasemonkey script Google Noise Reduction, which was an excellent per-domain results blocker for Google Search.

However, the new and powerful Google Hit Hider does work very well, and is very similar. It’s obviously learned a lot from earlier software like Blocksite, Surfclarity, and Noise Reduction (all of which no longer work with FF4 / the latest Google) and there are some nice refinements. Not the least of which is very easy import/export as simple plain-text lists of URLs.

It’s a fairly simple process to get your hand-crafted Noise Reduction blocklist out of Firefox and into Google Hit Hider…

1. In Firefox’s address bar, type: about:config

2. Scroll down to greasemonkey.scriptvals.http://exego.net//Google Noise Reduction.blacklist You’ll see…

({‘britannia.com’:true, ‘oxfordjournals.org’:true, ‘tandf.co.uk’:true, ‘ingentaconnect.com’:true, ‘sagepub.com’:true, ‘myspace.com’:true, ‘experts-exchange.com’:true})

3. Double click on the line of banned URLs you’ll find there, and copy them to Notepad.

4. Now just top-and-tail the list, then search and replace until you have a clean list, but leave each URL separated by a single comma. Save the list as a .csv (comma separated value) file, then open that with MS Office’s Excel (or whatever the free Open Office equivalent is). The list should load up with one URL per cell.

5. Now just copy and paste the resulting cleaned list into: Manage Hiding / List Util / ‘Perma-ban list’ in Google Hit Hider.

The advantage of this over the now-native Google blocking is that: i) it lets you break the 500 URL limit; ii) you can block domains en-masse rather than one at a time; and iii) it lets you easily import/export the blocklist, in order to share with colleagues etc.

An academic search group-test

A little group test, based on the single keyword Galerius. Chosen simply because a search for his name recently turned up in JURN’s usage statistics. This group test looked for relevant free full-text journal articles or book chapters in English, within the first three pages of results, and found:—

JURN’s number of relevant results would have been higher if I had included four results from the Dictionary of Greek and Roman Biography and Mythology. Several were also omitted because they only seemed to have the briefest mention of Galerius. In total, JURN provided 329 results for keyword Galerius — although not all in English. If only 50 or so of those were highly relevant (and a slightly more targeted search for Galerius Romuliana gives 37 that are very relevant to his main palace), rather than just incidental mentions of the name, then that would be quite a good haul for a newbie searcher just trying to use a single keyword. A couple of the best articles in English were pushed down to the last page of results, in amongst the non-English material, as they were on the French server Persee — presumably the Google algorithm thus classed them as “non-English” despite the use of English in the article (it’s apparently not currently possible to turn off the location detector). So there’s an interesting tip for JURN users — always skip to the furtherest search result page and check what’s there. It won’t be the sort of junk and spam you’d see in a normal set of Google Search results.

The main Google Search test (see above graph and table) involved actually downloading the PDFs to see if they really were articles or chapters, or just timelines/course documents/student essays. Both of the OAIster full-text results were in repositories. The single Archive.org result was a numismatic (coins) publication. All the HathiTrust results were from before 1910. Google Book Search results included several from pre-1910, including three for Gibbon’s Decline and Fall.

Those lucky few with access to Project Muse would have found 29 records in the results for this keyword, and JSTOR subscribers would have had a bumper crop of 55 quality full-text English results in the first three pages.