So, what was learned from yesterday’s Group test: repository search?

1. Finding PhD theses by using a Google based search tool is currently not an easy task, even when the tool limits Google to academic repositories. Are you trying to discover if someone has already written a PhD on your brilliant new idea for a PhD or book topic? Then don’t expect a small flock of related open access PhD theses are going to tumble into your lap(top) via a Google search. If your topic idea is in a fairly obscure and little-covered nook, then you’re going to have to dig. (Advice from others via social media may help somewhat, but probably can’t be relied on to be fulsome and comprehensive, and asking the questions required may risk giving away your precious topic idea.)

2. The Google Search trick of adding “submitted in” to help surface full-text theses does work, after a fashion. It’s clunky and cringe-inducing, admittedly. One might use it in conjunction with other commonly used title page items such as “requirements for * degree” or simply “supervisor” etc, which would help weed out government reports, project pages and calls for research. Such a search might usefully jettison filetype:pdf and thus roam freely across PDF, DOC, HTML and TXT.

3. Google has a titling problem with repository PDFs, even well-formatted thesis documents, in terms of extracting the title and using it as as the URL anchor text. Google appears to rank a search result higher when its system can be certain that the title of the article is known. That is no doubt a fine service for mainstream “I only look at the first ten links” end users, but it also has the side effect of masking the problem. The problem becomes very evident by the second page of a narrow search using a Google-based repository search tool, when laughable link titles such as “View/Open” and “thesis.pdf” can appear. Some repositories are finally becoming a little more Google-friendly, which may start to improve such matters. But the problem may also be one of the sheer volume of PDFs Google needs to process through an automatic metadata identification and extraction tool, as well as one of having clean Google-friendly site-maps from repositories.

If governments were to mandate not just OA, but full Google compliance by OA repositories, then I suspect that might help matters enormously?

Doubtless Google search scientists continue to work hard on automated document title inference, and future tweaks to Google’s growing machine intelligence may suddenly solve this problem. Though the situation for JURN has improved dramatically over the last six years, even without assistance from advanced kitten-trained uber-bots.

4. On the evidence of this group test alone, there seems little overlap between the PhDs listed on Summon and those to be found via OpenDOAR / GRAFT / Google Search / Scholar. This also seems true of OAIster and the British Library PhD search (I see the latter now has a useful OA indicator icon alongside results, and has generally become much more usable). So the standard advice that people need to cover multiple search tools to conduct a decent literature search remains good advice, and seems especially pertinent to those who operate outside of academia or large corporations.

5. Microsoft Academic does not seem to be returning searches based on an indexing of the full-text.

6. Impressive claims made about the size of FreeFullPDF seem to be overstated. For this search it could only return 32 results.

7. $14.6 million has been spent by the EU on the OpenAIRE portal, and yet it has consistently performed very poorly on JURN’s other group tests. But on this test it actually proved useful. It found two good theses on the microbial aspects of Arctic permafrost methane cycling, which no Google-based tool could surface.

8. Masters and undergraduate dissertations are commonly confused with each other and with PhD theses, in search results. This is the case even with Summon’s “Dissertations” facet, which appears to jumble undergraduate / Masters / PhD work into the same search results. Presumably it’s just not worth the grunt work needed to sort and tag them by type, especially since a humble one-year Masters dissertation is often aggrandised by referring to it as a thesis.

9. There seems to be huge scope for a small field, such as Film Studies, to build its own hand-crafted curated index of open access PhD theses. (Tip: look at WordPress for that, not Omeka. In my view Okema is not yet suitable for such tasks, having recently used it myself to build a catalogue of local natural history publications).

10. Finally, I might note that it’s curious that there has apparently been so little PhD thesis activity on the topic since 1980, given the huge commercial and political interest in gas hydrates and methane release. It’s perhaps understandable that some exploration / energy sector theses would be commercially sensitive, and thus not available. But a comparison search of Summon suggests that’s only the case in a handful of theses. A search of OAIster for the super-wide search methane arctic also gives a mere 15 theses/dissertations in English, six of those being on prehistoric time periods. Given all the unknowns around frozen hydrates, including the environmental impacts, one would have expected to have discovered a wave of theses on frozen hydrates and the environmental interactions. Perhaps it is simply too expensive for grad students to travel to the permafrost Arctic for weeks of work, though there are said to be many grants available for anything related to global warming in the Arctic. Or do grad students in the field simply prefer to publish in group expedition reports and co-authored journal articles? Maybe my thinking is just too individualistic, since I come from the humanities, and I am not sufficiently in tune with the collectivised nature of modern science? Perhaps that’s so, but then that should not have prevented individual PhDs on hot topics such as: the geopolitical dimensions of Arctic hydrates; world energy security futures; industrial extraction methods; international policy and treaties; or the ‘poster child’ use of permafrost hydrates in the rhetoric around greenhouse warming and the so-called ‘methane bomb’ / ‘carbon bomb’. So I just note here the curious mismatch between the media coverage and the apparent lack of interest shown by graduate students.