• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Category Archives: Academic search

Error rates for Google Scholar citation parsing

15 Thursday Nov 2018

Posted by futurilla in Academic search, How to improve academic search, Spotted in the news

≈ Leave a comment

Another new prodding of Google Scholar, this time from the latest First Monday “Testing Google Scholar bibliographic data: Estimating error rates for Google Scholar citation parsing”…

While data quality is good for journal articles and conference proceedings, books and edited collections are often wrongly described or have incomplete data. We identify a particular problem with material from online repositories [where there appears to be] considerable inhomogeneity in the implementation of data standards [and] a mismatch between repository software and the harvesting protocols employed by Google Scholar.

One of Scholar’s other problems is that it includes Google Books results. While 30% of the time its Google Books inclusions can useful, there is no way to exclude Books results. One might want to exclude because Scholar still can’t seem to determine a proper book from a robot-produced shovelware ebook that assembles public-domain content. Scholar has no ‘edition authority’ which states that the Joshi-edited and annotated Penguin Classics edition of H.P. Lovecraft’s “Dexter Ward” is the gold-standard and that it has a text that has been fully corrected of the many textual errors, omissions and editing mistakes of previous decades. Unlike the public-domain shovelware ebooks that flood Amazon and (often) Google Books.

A basic undergraduate level search, for instance, for Lovecraft “Dexter Ward”, demonstrates the problem on the first page. Joshi is nowhere to be seen, and the searcher is hammered by links to shovelware ebooks (or worse), often with citation counts that suggest they are legitimate.

Google Scholar at 389 million

14 Wednesday Nov 2018

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

Michael Gusenbauer, “Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases”, Scientometrics, November 2018.

The findings provide first-time size estimates of ProQuest and EBSCOHost and indicate that Google Scholar’s size might have been underestimated so far by more than 50%. By our estimation Google Scholar, with 389 million records, is currently the most comprehensive academic search engine.

With the later proviso that there are likely to be many duplicates and near-duplicates, with such tools reporting…

the number of all indexed records on a database, not the number of unique records indexed. This means duplicates, incorrect links, or incorrectly indexed records are all included in the size metrics provided by ASEBDs.

As you can see, the article coins the ugly and unreadable “ASEBDs” for “academic search engines and bibliographic databases”. MASTs might be more mellifluous — Massive Academic Search Tools.

Open and Closed Articles in Norway

27 Thursday Sep 2018

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

“Grades of Openness. Open and Closed Articles in Norway” (August 2018)…

Based on the total scholarly article output of Norway, we investigated the coverage and degree of openness according to three bibliographic services: 1) Google Scholar, 2) oaDOI by Impact Story [now called Impactstory], and 3) 1findr [formerly oaFindr]. According to Google Scholar, we find that more than 70% of all Norwegian articles are openly available. However, degrees are profoundly lower according to oaDOI and 1findr, respectively 31% and 52%.

open shares vary considerably by discipline, with … the Humanities at the lower end

Access to journals in Pakistan

17 Friday Aug 2018

Posted by futurilla in Academic search, My general observations

≈ Leave a comment

African universities often have better access to journal databases than western counterparts, thanks to big aid deals for the continent, but I wondered if Pakistan has a similar full-range access. I had a quick initial look at the journal-access situation in Pakistan, and soon found the national HEC Digital Library and its list of included databases and publishers…

“HEC National Digital Library (DL) is a[n official national] programme to provide researchers within public and private universities in Pakistan and non-profit research and development organizations with access to international scholarly literature based on electronic (online) delivery, providing access to high quality, peer-reviewed journals, databases, articles and e-Books across a wide range of disciplines.”

The supplied databases look like a wide selection and are available to bona fide institutions in Pakistan, though it looks like there’s a certain subset of databases reserved for larger institutions only.

Access there looks like it is broadly comparable to a medium-sized university in the west, if “The impact of non-accessible library and information science journals on research productivity in Pakistan” in 2016 in anything to go by. It found, from Pakistan…

“18% non-accessible and 37% partially accessible LIS journals on the HEC subscribed databases.”

Thought I note that, since then, Pakistan’s HEC Digital Library has added Gale, Oxford, Proquest, and probably others. Which has likely shortened the gap.

A 2009 grassroots report found that the main problem in access was said to be due to the frequent power-cuts, rather than databases…

“the respondents emphasized that electricity failure is the main hindrance to access to the digital library and to the Internet”

Open Semantic Desktop Search – free desktop search for Windows

02 Thursday Aug 2018

Posted by futurilla in Academic search, JURN tips and tricks, Spotted in the news

≈ 1 Comment

Open Semantic Desktop Search an “open source desktop search engine for full text search in documents”, that runs in SOLR on the Windows desktop through Oracle’s free VM VirtualBox. It’s been around since late 2015, and is actively being developed, but they obviously don’t employ a publicist to promote it.

It has a clean Web-like interface, supports the indexing of a great many file-types including .ePUB and .PDF files, even if they’re inside .ZIP files. Though it can’t yet index the Kindle’s .MOBI ebook files, so you’d need to do an overnight mass-conversion to .ePUB or .PDF using the free Calibre software, and your purchased encrypted Kindle files will still need to be searched using Amazon.

Despite being run in a VM (often slow in older Windows PCs), Open Semantic Desktop Search can work on…

“old standard hardware” and “The search engine works even offline or unhosted on a single laptop without need of a intranet or internet connection or a server.”

Though online comments suggest you’ll do best with a modern PC, and those with an over-stuffed hard-drive will need to clear 50Gb of disk-space to accommodate both the software and its resulting index. The disk-space needed may be less if you’re only indexing the folder containing the .PDFs and .ePUBs needed for your PhD or book research.

I haven’t installed and tested it yet, but it’s free and looks good. Apparently it can also auto-OCR inside PDFs that don’t have OCR text, a new feature added in a December 2017 update.

The search-engine software comes packaged in a 2.8Gb .OVA file that you download. This .OVA is a plugin module for the free VM VirtualBox (a 110Mb .EXE download), and the team’s Desktop Search page has instructions on how to plug your .OVA into the installed VM. It seems fairly simple to get it up and running.

Evaluating Access of Open Access Research

19 Thursday Jul 2018

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

“Practicing What You Preach: Evaluating Access of Open Access Research” (2017)…

“To explore the effectiveness of the new OA [DOI-based] finding tools, the next step of the study used the Chrome extensions for Google Scholar, Lazy Scholar (LS), Unpaywall, and the Open Access Button (OAB) to look for green OA versions of paywalled articles. [At 160 articles] The study sample size was triple the amount of articles that Grandbois and Beheshti (2014) found in their study.”

4m Open Library books, full-text, deep search

14 Saturday Jul 2018

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

You can now ‘search inside’ all 4m Open Library books held at Archive.org, with your search seemingly constrained to just those books (and not the jumble that Archive.org also hosts). Nice results, with multi-snippets from deep inside the full-text of the books, plus phrase highlighting. This looks like excellent work, and it takes advantage of new tweaks by Archive.org’s search leader Giovanni Damiola.

A serious history researcher is still going to need to pound Archive.org itself and go through everything, but at first glance this seems to be a useful time-saver for those who only need to search the upper layers of the service.

The ultimate goal of the Open Library is “One Web page for every book ever published”. Think of it as one of those annoying university repositories where 95% of the full-text is not available yet, but will be one day… so “here’s a record page instead”. But in this case it’s for all books, and already has a substantial amount of full-text for free.

On ResearchGate

22 Tuesday May 2018

Posted by futurilla in Academic search, Official and think-tank reports, Spotted in the news

≈ Leave a comment

What publishers can take away from the latest early career researcher research ($), a five-page “Industry Update” for the journal Learned Publishing, 28th April 2018…

“ResearchGate is unquestionably the scholarly elephant in the room, which despite being just 10 years old boasts 15 million research members and is still growing at a rate of knots. … publisher offerings can look monastic and parochial by comparison. […] It looks rather like the new scholarly world order.” […] “Much depends on whether ECRs [early-career-researchers] take their millennial beliefs in sharing, openness, and transparency into leadership positions. [and if] publishers [start] feeding ResearchGate rather than competing with it – [making it] a publishing Amazon”.

The Update is by the team doing an industry-supported three-year cohort study of search and similar practices. Their first two reports are Early Career Researchers: the harbingers of change? Year One 2016 and now also the Year Two 2017 report, both free and public at the same website. Apparently the cohort of around 100+ is all science and social studies.

Also fairly new, and related, “ResearchGate and Academia.edu as networked socio-technical systems for scholarly communication: a literature review” (OA), in the Research in Learning Technology journal, 20th February 2018…

“a thorough understanding is still lacking of how these sites operate as networked socio-technical systems reshaping scholarly practices and academic identity. This article analyses 39 empirical studies published in peer-reviewed journals with a specific focus on ResearchGate and Academia.edu.”

Google Search currently suggests circa 72-million full-text PDFs at ResearchGate, although given the above Industry Update statement on ‘the 15m members’ we can probably assume some 10m of those PDFs are just CVs (which are nearly all excluded from JURN, by the way). Remove other fluff and I guess there might be circa 50m proper papers there. It would then be interesting to work out what “the uniques” are, by removing the papers freely available elsewhere in repositories and OA journals and suchlike. I’d very roughly guess that including ResearchGate PDFs in JURN may bring in some 5m to 8m papers not found elsewhere.

NOA : Scientific Image Search

20 Sunday May 2018

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

NOA : Scientific Image Search. A project currently indexing 2.7 million free-to-use scientific images, extracted from CC-BY sources along with metadata and links. As you’d expect, ncbi.nlm.nih.gov dominates the URL links. I searched for European Lynx and had good results (big kitties), though nothing high-res.

The extracted images and their data are also being copied over to Wikimedia, where Google Images will pick them up after a while — and offer high-res filters.

Incidentally, ncbi.nlm.nih.gov has its own public and official Open-i Biomedical Image Search Engine. A search for European Lynx shows it is indeed strictly biomedical.

Searching for Recent Anthropology and Archaeology Publications

20 Sunday May 2018

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

“Searching for Recent Anthropology and Archaeology Publications”, a frank new paper in the ANSS Currents (Anthropology Section of the American Library Assoc.) Spring 2018 issue. The authors examined the apparently rather severe shortcomings of commercial anthropology databases such as Anthropology Plus (EBSCO), when used to try to find recent 2013-2017 faculty papers/chapters needed to support undergraduate essay research.

While Berkeley anthropologists are prolific and well-known, their works remain hidden even in a systematic search.

← Older posts
Newer posts →
RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • February 2026
    • January 2026
    • October 2025
    • May 2025
    • April 2025
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.