New paper: Academic Web Search Engines, 2014-2016

“An Evidence-Based Review of Academic Web Search Engines, 2014-2016”… “This article seeks to summarize research concerning Google Scholar, Google Books, and Microsoft Academic from the past three years”.

Useful. Interesting snippets from this excellent new summary survey:

* Weiss noted, “no critical studies seem to exist on the effect that Google Books might have on the contemporary reference experience” (Weiss 2016, 293). […] Research is badly needed about the coverage and utility of both Google Books and Microsoft Academic.”

Seriously? None, not one single study from 2005-2015? For one of the most important innovations in books since Gutenberg? Wow. That’s one hell of a grudge you’re holding there, librarians.

* “In September 2016, Hug et al. […] noted Microsoft Academic has “grown massively from 83 million publication records in 2015 to 140 million in 2016″ […] As of February 2017 its index contains 120 million citations.”

Great news, which means I’ll have to take another look at that. I’m overdue for doing another big ‘group test’ of OA coverage in public search-engines, so this news may spur that. Of course, “citations” are not full-text, but 120m is impressive.

* “Bonato [2016] noted Google Scholar retrieved different results with Advanced and Basic searches”

So that’s another thing to take into account if I do another group-test this summer.

* A “glaring lack of research related to the [search] coverage of arts and humanities scholarship” [and specifically] “Little is known about coverage of arts and humanities by Google Scholar.” [and it is evident that arts and humanities scholars’] preferences and behavior […] cannot be inferred from the vast literature focused on the sciences.”

* “research concerning the use of academic web search engines by undergraduates, community college students, high school students, and other groups would be welcome.”

* “Scholar results have been said to contain “clutter””.

This is the closest the paper comes to mentioning all the predatory journals and similar dubious items, which get dragged into Scholar by automated collection bots.

* “During interviews of 20 historians by Martin and Quan-Haase (2016) concerning serendipity, five mentioned Google Books and Google Scholar as important for recreating serendipity of the physical library online.”

Yes, serendipity is vital. It’s more of a loosely chain-linked set of serendipity loops during search-based research, really, interspersed with deep-dives to get tiny confirming nuggets of fact (e.g.: was Borges correct when he suggested that The Time Machine‘s famous central motif of ‘the future-flower’ was almost certainly not influenced by a striking passage in Coleridge’s notebooks? Yes he was, presumably by a private letter of enquiry to some learned bibliophile in London. But he was characteristically recondite on this point in the essay, and thus can only be proved correct if you do the 30 minute deep-dive to the primary sources needed to get the exact month-of-publication dates in 1895).

* “arts and humanities scholars […] commonly expressed the belief that having a complete list of research activities online improves public awareness [with] the enormous potential for this tool’s use.”

Might be more useful to have a rolling listing of what’s not being done, but which needs to be done. Sort of like a speculative Kickstarter, only you’d gather people rather than cash.

* “Gardner (2016) showed […] people working in the humanities and religion and theology prefer to use Google”. “Humanities scholar use of Google over Google Scholar was also found by Kemman et al. (2013); Google, Google Images, Google Scholar, and YouTube were used more than JSTOR or other library databases”

* “Namei and Young’s [2015] comparison of Summon, Google Scholar, and Google using 299 known-item queries. They found Google Scholar and Summon returned relevant results 74% of the time; Google returned relevant results 91% of the time.”

* “In Yang’s (2016) study of Texas Tech’s DSpace IR [the university repository], Google was the only search engine that indexed, discovered, or linked to PDF files supplemented with metadata; Google Scholar did not discover or provide links to the IR’s PDF files, and was less successful at discovering metadata.”

I’m guessing this possibly illustrates the value of separating a university’s big dumpy Digital Collections from the nimble research repository, by putting them on different domains? Texas Tech’s DSpace has them both cheek-by-jowl, and adds a Law repository for good measure.

* “IR platform and metadata schema dramatically affect discovery, with some IRs nearly invisible (Weideman 2015; Chen 2014; Orduña-Malea and López-Cózar 2015; Yang 2016) and others somewhat findable by Google Scholar (Lee et al. 2015; Obrien et al. 2016).”

* “Another area needing investigation is the visibility of links to free full text in Google Scholar.” [and more generally] “retrieval of full text, which is another area ripe for more research studies, especially in light of the impressive quantity of full text that can be retrieved without user authentication.” […] “When will academic users find a good-enough selection of full-text articles that they no longer need the expanded full text paid for by their institutions?”

Indeed.

There are also good formulations of four future-research questions specific to the arts and humanities (pages 27-28).

News from JURN

~ search tool for open access content

New paper: Academic Web Search Engines, 2014-2016

Leave a Reply Cancel reply