Interesting new paper at PLOS One, “The Role of Google Scholar in Evidence Reviews and Its Applicability to Grey Literature Searching”.
Test searches were drawn from review papers…
“…chosen as they covered a diverse range of topics in environmental management and conservation, and included interdisciplinary elements relevant to public health, social sciences and molecular biology.”
… and compared alongside Web of Science results…
Surprisingly, we found relatively little overlap between Google Scholar and Web of Science (10–67% of WoS results were returned using searches in Google Scholar using title searches).
Unsurprisingly, Google Scholar wasn’t found to be the one-stop shop many assume it to be…
… some important evidence was not identified at all by Google Scholar … [so it] should not be used as a standalone resource in evidence-gathering exercises such as systematic [literature] reviews.”
Interesting finding also that…
“Peak” grey literature content (i.e. the point at which the volume of grey literature per page of search results was at its highest and where the bulk of grey literature is found) occurred [in Google Scholar] on average at page 80 (±15 (SD)) for full text results … page 35 (± 25 (SD)) for title [search] results.”
So this suggests that one might usefully flick through to result 700 (of 1000) and work a few hundred results starting from there, if seeking grey literature with a very well-formed topic search? By well-formed I mean the sort of sophisticated literature-review style of search term chaining being used in this study, for example…
“oil palm” AND tropic* AND (diversity OR richness OR abundance OR similarity OR composition OR community OR deforestation OR “land use change” OR fragmentation OR “habitat loss” OR connectivity OR “functional diversity” OR ecosystem OR displacement)
It appears that the researchers only auto-extracted “citation records” from the search results, and then classified into broad categories based on those alone. There appears to have been no checking as to the validity of the link, and/or downloading and scrutiny of PDFs. So there are no measurements of how many of Google Scholar’s links work or lead to free no-paywall fulltext articles.
Lastly, I noted…
Google Scholar has a low threshold for repetitive activity that triggers an automated block to a user’s IP address (in our experience the export of approximately 180 citations or 180 individual searches). Thankfully this can be readily circumvented with the use of IP-mirroring software such as Hola (https://hola.org/)”