New research on Google Flu Trends

One of Google’s public data-driven prediction systems has caught a cold, according to weighty new research…

“Google Flu Trends, which launched in 2008, monitors web searches across the US to find terms associated with flu activity such as “cough” or “fever”. It uses those searches to predict up to nine weeks in advance the number of flu-related doctors’ visits that are likely to be made. The system has consistently overestimated flu-related visits over the past three years, and was especially inaccurate around the peak of flu season — when such data is most useful.”

The doctors prescribe taking a healthy dose of national health statistics…

“Merely projecting current CDC data [doctors’ visits as recorded at the US Centers for Disease Control and Prevention] three weeks into the future yields more accurate results than those compiled by Google Flu Trends. Combining the two resulted in the most accurate model of all.”

Although one has to wonder about prediction feedback loops here. What if Google Flu Trends was actually right? But that Trends-watching doctors, carers and the public all put into effect various extra measures that stopped the Trends prediction from coming true in the longer-term six-to-nine week window? Or what about some kind of media amplification loop: more media chatter hits the news as the epidemic surfaces into the public mood, meaning that non-sufferers start using the relevant keywords more in social media?

Meagre harvest gleanings

Knoth, Petr (2013). “From open access metadata to open access content: two principles for increased visibility of open access content”, conference paper presented at: Open Repositories 2013, 8th-12th July 2013, Charlottetown, Canada.

… only 27.6% of research outputs in repositories are linked to content that can be downloaded by automatic means and analysed (e.g. indexed). […] the median repository will only provide machine readable content for 13% of its deposited resources. [but] it is likely that these statistics are in fact rather optimistic …

“And the Animals Came Two by Two…”

Perhaps it’s down to the influence of the publicity for the new Noah movie (heh), but I’ve made various additions today that mean JURN now has reasonable coverage of open access ecology and ornithology (birds) journals. Or perhaps its just because they’re currently a nicely compact set of ejournals and open resources, and as such are fairly easy to include. Thanks to Writing for Nature for his recent trawling and filtering of the DOAJ for core ecology titles, and to Ornithology Exchange for a big and fairly current list of ejournals in ornithology, complete with a handy side-table linking to any open access volumes. JURN is, for now, only indexing the more current of the OA titles on the Ornithology Exchange list.

GoogleMonkeyR temporary fix

It seems that Google Search have committed to their new code for displaying Google Search results, after trialling the changes last week and then withdrawing them. The changes break the vital browser addon GoogleMonkeyR. A temporary fix is to edit the GoogleMonkeyR userscript thus…

Find…

var list = document.getElementsByXPath(".//div[@id='ires']/ol/li[starts-with(@class,'g')]/div/parent::li");

Replace with…

var list = document.getElementsByXPath(".//div[@id='ires']/ol/div[starts-with(@class,'srg')]/li");

Confirmed as working with Google.com search. Fails when you switch the keyword through to Google News.

UPDATE, NOV 2014.

Still working fine for me, with a few tweaks…

1. Updated Greasemonkey to 2.3 (29th Oct 2014) and GoogleMonkeyR to 1.7.2.

2. I access Google Search via this URL, which has a parameter that limits search results to 15 per page…

https://www.google.com/webhp?hl=en&complete=0&tbo=1&num=15&tbs=li:1

15 fits nicely in three columns, which I also have set up in GoogleMonkeyR Prefs — which is the cog-wheel that appears top-right once you make a Google search.

googlemonkeyr

3. Hide the “Searches related to test” element on the Google Search results page, by using the AdBlock Plus addon (right-click on “”Searches related to test””, ‘Inspect Element’, highlight whole ‘extrares’ element, click on red AdblockPlus icon, block). This bit gets hidden because otherwise it sits awkwardly between you and the numbered links that lead to the subsequent results pages.

Dutch OA indexed at just 11% in Web of Science

Wouter has today posted a Powerpoint with a slide showing the number of Dutch open access articles and reviews indexed in Web of Science, 1995-2015…

open-access-in-wos

It’s good to see coverage is ‘on the up’, but it seems that open access journal content from the Netherlands is currently indexed in WoS at just 11%. This is another indication of the low level of OA journal article discoverability in big commercial databases, and a reminder that the coming Google Scholar / Web of Science combo interface won’t make Scholar a one-stop shop for finding open access articles.

Indexing of OA Communication Studies journals

More new research on Open Access ejournal penetration into commercial journal indexing databases: “Open Access Journals in Communication Studies: Indexing in Five Commercial Databases” (2014). Only…

32 percent of the 147 gold OA journals identified [in the field of Communication Studies] were indexed in five major commercial bibliographical databases commonly subscribed to by academic libraries [including Scopus, EBSCO Complete, Web of Science]

Global Social Science & Humanities Publishing 2013-2014

Joseph Esposito has usefully had a peek inside a very expensive commercial market report titled Global Social Science & Humanities Publishing 2013-2014.

Social/Humanities publishing is found to be perhaps 25% of the size of Science/Technology/Medicine, at around $5bn. That actually strikes me as something of an achievement, when you consider that we have far smaller research funding inputs and a smaller technical/training infrastructure to call on. But perhaps the $5bn figure is given a strong boost by teacher training textbooks, social work manuals and the like?

Joseph highlights the report’s finding of a highly fragmented market. This market fragmentation is one of the reasons I’m skeptical about the success of a ‘one metadata to rule them all’ solution to OA indexing and discovery. It seems that DOAJ-listed OA journal titles can’t even find their way in full-text into the largest of commercial databases (such as EBSCO Complete) at higher levels than just over 20%. When last heard of the Web of Science / Scopus seemed to be barely scraping 1,000 OA titles indexed. One art history study found that Google Scholar could index only half the DOAJ’s OA art history titles. A dastardly conspiracy to keep OA titles out of these big indexes seems unlikely. So I suspect it’s largely due to many OA editors in the arts and humanities not giving a fig about providing the means to automatically index their content. Their widespread lack of something as basic as RSS feeds seems to confirm that. Add to that the fact that only 56% of DOAJ journals can supply the DOAJ with article metadata. Persuading non-librarian types to do something as simple tag all their back-issue content with some simple new machine-readable OA tag thus seems rather a long shot. Persuading mainstream publishers to do the same? Well… maybe, but what’s their incentive for that? Even if they do, will they allow mass harvesting of the OA articles? Nor are librarians likely to be of much use, after the fact of publication — since they seem to have mostly failed to apply even their own metadata standards to open content, and open repository metadata quality is reported to be dire.