• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Category Archives: Spotted in the news

Paperwork – free desktop OCR and search, in open source

22 Tuesday Jan 2019

Posted by futurilla in JURN tips and tricks, Spotted in the news

≈ 1 Comment

Paperwork, a new addition to my A quick guide to desktop search software post…

Paperwork, free open-source software to help a scholar get to grips with their PDF pile, without hooking into some online service that wants to gouge your data — it OCR’s all your PDFs and other documents and then searches across them quickly.

You need a big chunk of spare disk space, it seems, because if you have 25Gb of stuffed-full folders, Paperwork will want to copy all of them over to its own C:\Users\YOURNAME\papers\ folder to OCR them. That makes sense, I guess, so you keep a copy of the original non OCR’d file. But at the cost of using significant disk space for duplicated files.

It comes with optional OCR interpreters for the world’s current languages, but so far as I can see it won’t do German ‘black letter’ (for which you need this).

Under “Settings” there is a “Send anonymous usage statistics” check-box, but this is turned off by default.

It looks good, but suffers from a non-standard Windows user interface which doesn’t appeal. But one could theoretically use it only as the software that watches your “Papers” folder and auto-OCRs any new PDF placed there (for which there seems no other free non-cloud competitor with a GUI). Then you’d point dtSearch at C:\Users\YOURNAME\papers\ for indexing, and use the powerful dtSearch interface for your actual searches.

At the Opera

03 Thursday Jan 2019

Posted by futurilla in JURN tips and tricks, Spotted in the news

≈ 1 Comment

Ah, finally! The latest version of my Opera Web browser (57.03.x) now supports pretty page-like display of raw .XML news feeds, when you encounter them via search or bookmarks. They don’t also offer an .MP3 button, but you just right-click on the tile and “Save linked content as…” to get the .MP3 or similar media downloading.

Humanists’ digital workflows

05 Wednesday Dec 2018

Posted by futurilla in Academic search, JURN tips and tricks, Spotted in the news

≈ Leave a comment

New in DHQ: Digital Humanities Quarterly, “Researcher as Bricoleur: Contextualizing humanists’ digital workflows”. A small-scale observational study from 2016, building on a larger ‘Digital Scholarly Workflow’ study. The body is made up of case studies and commentary. Here’s the tale of a search by a historian for “1916” “November” “War Council”:

Audrey, a professor of history, searched for literature on an event that took place in 1916, and for which she had only partial information. Audrey’s search starts with her personal collection of notes written in Word and stored on the internal hard drive. She uses a Word search function that queries the folder for a supposed event name, but this search yields no result. Audrey then switches to her browser and the online search. She logs on to the Penn State library and enters a search phrase composed of three descriptors into the discovery search interface, LionSearch. This attempt does not yield any results either.

“Okay, no problem, I’m going to go to some of my favorite databases,” Audrey says optimistically, and, using the same search phrase, she continues her search in the Historical Abstracts database. “All right, I need another field. It happened in Rome,” she comments still optimistically, and expands her search with one more field, which reads “Rome.” Still nothing. “Seriously?!,” Audrey exclaims with annoyance. “All right, let me just do ‘war council,’ something more specific,” she says with reasserted optimism, and changes her search phrase accordingly. Failure again. “Really?!,” Audrey laments in shock. “I would have thought it was more important.” Audrey then reaches to her bookshelf and grabs a book. She reads through a few pages, trying to find any additional information that could help her search. Nothing. But Audrey is not ready to give up yet.

She returns to her library search and adds “November” as one more search field, trying to make her query as precise as possible. No results. Still, Audrey does not give up, and, instead of adding one more search term, she decides to change her search phrase. She creates a new search phrase, again composed of three descriptors as the possible event name. “Nope. All right, strange,” Audrey says quietly, confident that any further search would be pointless. “You would think someone must have written an article about this. It was the time that the different allies got together and hammered out a strategy…,” she continues murmuring, but discontinues her library search.

Instead, Audrey decides to try her luck with Google Search. She enters the search phrase and the Wikipedia entry pops up right away. “See, that’s the thing,” Audrey comments. “One would love to use more scholarly resources, but I just typed [the search phrase] and it’s up there [on Wikipedia]! Sadly, Historical Abstracts was not of too much use; the most useful one was still Wikipedia,” this historian concludes.

The problem here appears to be that the Supreme War Council of the three allies was created in November of 1917, not 1916. Only by switching the search terms from 1916 to 1917 does the Wikipedia page mentioned appear, so one has to suspect that there was some finessing of the search before hitting Google Search.

Run Opera? How to unpluck your search-engine results

28 Wednesday Nov 2018

Posted by futurilla in JURN tips and tricks, Ooops!, Spotted in the news

≈ Leave a comment

Has your ad-blocker (and other scripts) stopped working in the Opera Web browser today? It’s nothing to do with changes made by Google, Bing, Yandex etc.

What’s happened is that Opera has high-handedly decided to disable all adblocker and script-blocker addons from running on search-engine results pages. Thankfully, for now, the browser still has an option to turn off this unwanted and highly dangerous stupidity (disabling script-blockers etc) from the owners of Opera. Here’s the fix…

“For some reason Opera with the latest update have decided to add a new option for extensions that will disable them by default for “search page results”. You’ll have to go to top bar > Menu > Extensions > and then scroll down and tick the box “Allow access to search page results” for your addons. After that it will work normally again.”

You need to do this for each addon that affects search engines and their results, for example…

If you have a JURN link on your Google Search menu bar, via my UserScript, to get it back make sure to also enable TamperMonkey for Opera…

Error rates for Google Scholar citation parsing

15 Thursday Nov 2018

Posted by futurilla in Academic search, How to improve academic search, Spotted in the news

≈ Leave a comment

Another new prodding of Google Scholar, this time from the latest First Monday “Testing Google Scholar bibliographic data: Estimating error rates for Google Scholar citation parsing”…

While data quality is good for journal articles and conference proceedings, books and edited collections are often wrongly described or have incomplete data. We identify a particular problem with material from online repositories [where there appears to be] considerable inhomogeneity in the implementation of data standards [and] a mismatch between repository software and the harvesting protocols employed by Google Scholar.

One of Scholar’s other problems is that it includes Google Books results. While 30% of the time its Google Books inclusions can useful, there is no way to exclude Books results. One might want to exclude because Scholar still can’t seem to determine a proper book from a robot-produced shovelware ebook that assembles public-domain content. Scholar has no ‘edition authority’ which states that the Joshi-edited and annotated Penguin Classics edition of H.P. Lovecraft’s “Dexter Ward” is the gold-standard and that it has a text that has been fully corrected of the many textual errors, omissions and editing mistakes of previous decades. Unlike the public-domain shovelware ebooks that flood Amazon and (often) Google Books.

A basic undergraduate level search, for instance, for Lovecraft “Dexter Ward”, demonstrates the problem on the first page. Joshi is nowhere to be seen, and the searcher is hammered by links to shovelware ebooks (or worse), often with citation counts that suggest they are legitimate.

Google Scholar at 389 million

14 Wednesday Nov 2018

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

Michael Gusenbauer, “Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases”, Scientometrics, November 2018.

The findings provide first-time size estimates of ProQuest and EBSCOHost and indicate that Google Scholar’s size might have been underestimated so far by more than 50%. By our estimation Google Scholar, with 389 million records, is currently the most comprehensive academic search engine.

With the later proviso that there are likely to be many duplicates and near-duplicates, with such tools reporting…

the number of all indexed records on a database, not the number of unique records indexed. This means duplicates, incorrect links, or incorrectly indexed records are all included in the size metrics provided by ASEBDs.

As you can see, the article coins the ugly and unreadable “ASEBDs” for “academic search engines and bibliographic databases”. MASTs might be more mellifluous — Massive Academic Search Tools.

News you can lose…

03 Saturday Nov 2018

Posted by futurilla in Spotted in the news

≈ Leave a comment

It looks like stories from U.S. news outlets, those that blank UK and European visitors, are now simply being removed from the Google News results. Spotted today, under the Google News results…

Annoying for those inclined to turn on their VPN and see the news story regardless. But, in practice, the publications still blocking overseas visitors are such low-grade regional newspapers that it’s no loss.

Clips and flicks

03 Saturday Nov 2018

Posted by futurilla in Spotted in the news

≈ Leave a comment

All U.S. film-makers can now crack anti-copying technologies on content ($ paywalled at law.com), if they need that content for ‘fair use’ use in a new production…

“Digital Millennium Copyright Act (DMCA) exemptions aren’t just for documentary filmmakers any more. The U.S. Copyright Office and Library of Congress last week broadened a DMCA exception to now allow more filmmakers to circumvent anti-copying technology and rip short video clips for purposes of commentary and criticism.”

However, it isn’t a free-for-all. Note that the PDF for the rules states that this new measure is specifically for…

“where the clip is used for parody or its biographical or historically significant nature”.

In a drama movie, the “commentary and criticism” would thus presumably be seen to be implied by the nature of the scene, rather than done in a directly academic or journalistic manner. For instance, I can imagine a dramatised scene of dancing on the beach as the Apollo 11 rocket lifts off behind the dancers. This scene would be a sort of implied commentary on the optimism engendered in the nation by the historically significant moment of sending men to the Moon. And if the high-res source needed for that was only available from Time-Life rather than NASA, then their Blu-ray disc could be cracked and a clip used as the background in the composite. Actually these days it’s probably easier to do it with 3D models and copy of Vue, but some may want the original footage — and historical personages can’t simply be conjured up in the same way.

Also, as the word “clip” is used and video is assumed in the PDF’s text, that leaves hazy the cracking of content protection to obtain a high-res still picture. A film-maker might need such a still for a Ken Burns “pan and scan” type film, and could perhaps argue that the still was required as a irreplaceable source needed to make the film’s video “clip”. But that’s probably something to be clarified in a future round of rule changes.

Retraction Watch Database

27 Saturday Oct 2018

Posted by futurilla in Spotted in the news

≈ Leave a comment

Retraction Watch now has a unified database of retracted papers.

Text Cleanup 2.0 – now free

24 Wednesday Oct 2018

Posted by futurilla in JURN tips and tricks, Spotted in the news

≈ 1 Comment

I’m pleased to see that Text Cleanup 2.0 is now freeware. It’s Windows desktop software from 2003 that “fixes” text automatically when you copy-paste it. For instance, by unwrapping a chunk of text that has hard line-breaks. Text Cleanup has a nice balance of power and ease-of-use, can save user presets, and still runs fine on a Windows 8.x desktop.

← Older posts
Newer posts →
RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • February 2026
    • January 2026
    • October 2025
    • May 2025
    • April 2025
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.