No babies = no humanities

Government decrees closure of all humanities degrees. No, it’s not another crazed Putin pronouncement from Russia. It’s sober Japan…

Many social sciences and humanities faculties in Japan are to close after universities were ordered to “serve areas that better meet society’s needs”. Of the 60 national universities which offer courses in these disciplines, 26 have confirmed they will either close or scale back their relevant faculties at the behest of Japan’s government. … 17 national universities will stop recruiting students to humanities…”

…[the move in Japan is] linked to a low birth rate and falling numbers of students, which has led to many institutions running at less than 50 per cent of capacity.”

And the situation is only likely to get worse. In population terms Japan is headed back to where it was in 1955…

numbers-dont-lie-graphic2-1440084895801 Source: IPSS (National Institute of Population and Social Security Research) via IEEE.

So it seems that the same blanket closure of humanities departments may soon be forced on other nations in steep demographic decline, such as Russia and most of post-socialist Eastern Europe. Possibly even southern Italy.

Which is one reason why I’ve been so pleased to see the amazing baby boom we’ve been having here in the UK over the past five years, which shows no sign of stopping any time soon. We seem to have babies and toddlers everywhere you look, and the supermarkets usually now dedicate two double-sided aisles to romper-suits, nappies, toddler clothes, baby food etc. Midwives are worked off their feet, and infant school reception classes are so full that the kids are almost falling out the windows. And I’d take a bet that these kids are going to be remaking and reinventing British youth culture circa 2023-28, and then surging into the universities circa 2026-35.

JISC benchmarking tool for OA in the UK

A handy benchmarking tool for OA in the UK

CIAO is a benchmarking tool for assessing institutional readiness for Open Access (OA) compliance … produced as part of the JISC OA Pathfinder…”

oaguide

Looks good, but omits the utterly vital element of ‘Public, Peer and Government Discovery’. I’d suggest adding an extra strip with the following wording/steps…

ENVISIONING: We do not know what proportion of our OA repository contents can be found via public search-engines, or the quality of the search results that link to our repository.

DISCOVERING: We are considering the most effective steps to improve our repository coverage in public search-engines, and are taking advantage of guides and free consultancy work offered by staff at major search engines such as Google. We will rank the priority of these steps by both their likely impact on discoverability and ease of implementation.

DESIGNING & PILOTING: We have committed funds to implement and test at least ten commonly recommended methods that will increase our repository’s coverage in the public search-engines. Graduate interns have been recruited to aid the repository staff during this period.

ROLLING OUT: The planned measures have been turned on or implemented. Systems and staff are in place, and best practice workflows have been clearly documented and disseminated. Search engine indexing of our repository content is being tested to gather reliable metrics on: increased indexing coverage; time to index new content; and search result quality. We are also internally monitoring visitor traffic and open/dwell rates.

EMBEDDING: We are examining further measures to boost the quality of the public search results for our repository content, such as ensuring that the document title is used in the results Web link. We are considering acquiring funds to undertake certain large-scale measures once deemed too expensive to implement, such as retrospectively re-working the university-branded cover-pages applied to our PDFs. Senior staff have recognized that Web traffic to our OA repository represents a valuable branding, outreach and recruitment opportunity. The repository is no longer seen as drain on resources or as general-use web storage for the university.

What’s in Common Crawl?

The Common Crawl service is an open monthly crawl of the Web. It currently weighs in at a whopping 145TB, and is seemingly limited to Web sites with high-ranking inbound links.

How does one discover if an URL is being crawled for Common Crawl? With the Common Crawl Index, an URL lookup tool for Common Crawl. It’s then apparently a fairly easy thing to work down a tree for identification and extraction of, say, just the Wikipedia index segments from the monthly crawl.

A test with some randomly selected (and rather more obscure than otherwise) JURN URLs suggests that Common Crawl does indeed visit a wide range of URLs. You can see what actual pages on an URL are being indexed by Common Crawl, by replacing the *.URL inside this…

http://index.commoncrawl.org/CC-MAIN-2015-06-index?url=*.jprstudies.org&output=json

In the above instance, the crawler doesn’t appear to going very deep into the heaving bosom of Journal of Popular Romance Studies. To create a version of JURN on Common Crawl the crawler would need to be told to explicitly do a deep harvest on each URL, rather than only collecting pages with high-ranking inbound links. That’s my guess, and it might explain the sparse harvest of jprstudies.org (see above). This guess seemed to be confirmed, when I found a Common Crawl forum comment in August 2015 by Tom Morris…

…given a fixed budget, focusing on crawling entire domains, whether by using sitemaps or other means, will, necessarily, reduce the number of domains which are crawled. Focusing on crawling all structured product data will mean sacrificing crawling popular pages.”

So the JSON output for the above link effectively tells you what your domain’s most popular pages are, as judged by inbound links from quality sources. That, in itself, may be rather useful for some.

What about PDFs? It seems that some PDFs are collected and indexed alongside the HTML. Not many PDFs seem to make it into the index, though. For instance, another forum comment showed a table for the March 2015 crawl, which had 3,111,864 PDFs against 1.6Bn HTML pages. The PDFs that do make it in often appear to be truncated.

Google Books corpora

Google Books corpora, an alternative search interface for Google Books…

This new interface for Google Books allows you to search more than 200 billion words [though it is] not an official product of Google or Google Books. Rather it was created by Mark Davies, Professor of Linguistics at Brigham Young University…”

A Google link: search suggests it’s had some notice from the field of linguistics, but not from outside. For non-linguists the tool seems to serve as a rather useful way of quickly bouncing a Google Books search through to a specific decade, without spending a minute fiddling around with the custom range date fields in the small Google Books date drop-down box. The tool may be especially useful for those who need to do this sort of Google Books search many times for many different decades.

As a test I looked for the phrase “a whit” (as in “not a whit of it” and “He had not changed a whit”), and then clicked on the link to occurrences from the period 1900-1910. I was taken to the book results on Google Books, and saw that the custom date range was automatically constrained to 1 Jan 1900 – 31 Dec 1909. There was some confusion by Google with “a Whit-sunday”, but finessing the search terms would have probably fixed that.

ngrams

gbresults

whitgb

OpenURL and linkrot

“Measuring Journal Linking Success from a Discovery Service”, March 2015…

OpenURL has become, in a sense, the glue that holds the infrastructure of traditional library research together, connecting citations and full text. … [We found that] One-click (OpenURL) resolution was noticeably poorer [than Summon], with about 60% of requests leading directly to the correct fulltext item. More alarming, we found that, of full-text requests linked through an OpenURL, a large portion — 20% — fail.”

So… 40% of fulltext requests go to the wrong item? And 20% fail altogether. That sounds to me like a 60% failure rate.

The Mysterious Case of the Case-bound Book

The Guardian‘s ‘Anonymous Academic’ runs some numbers today on overly expensive academic hardbacks, the sort that gather dust on the shelves of university libraries…

Seventy-five books [per editor, per year], £80 each, selling on average 300 copies. That’s £1.8m. And he’s just one of their commissioning editors.”

The Guardian‘s academic was told that “friends [can] act as reviewers” for his book proposal. And that the author and his proposal-reviewer “friends” might also add the book to class reading lists, and thus ease it toward becoming a library purchase. Left unsaid, at least in the publisher’s initial phone pitch, is the implication that “friends” might also write book reviews of the title after publication.

These are the sort of books for which there will never be a cheap paperback version, just the choice of a very nice £60-£80 case-bound hardback or an ebook only that’s only slightly cheaper than the paper edition. By my rough calculation the profit per £75 book is around £12,000, even on only 300 sales. To reach that figure I assume each book proposal is swiftly handed off after approval to a home-working freelance, who might be paid £4,500 per book to get it into a publishable state. I also assume there’s a £20 manufacturing and shipping cost to be deducted per book, since in my limited experience as a reviewer and shelf-browser such books tend to be print-on-demand from Lightning Source (look at the very tiny small-print in the very back of the book). Every ebook edition sold, however, would mean about £17 extra profit per book — assuming some of that £17 isn’t passed along as discount offered to the library’s purchasing clerk.

If a telesales lead-generator and initial author handler is given a target of drumming up 75 new book titles per year, as The Guardian‘s article suggests, in the expectation that he only delivers 50, then he’s potentially generating £600,000 profit per year for someone. One suspects his own salary amounts to far less than that.

At that sales/profit ratio might the academic world need to guard against a de facto ‘guaranteed book purchasing’ ring? Perhaps one loosely spread across the world’s libraries and differently configured/staggered for each book title?

Chimp vs. Elsevier, Chimp wins…

ChimpFeedr RSS Feed Aggregator is a useful service from the popular MailChimp mailing-list service…

Enter a bunch of RSS feeds into ChimpFeedr, and we’ll mash ’em up into one master RSS feed.”

Since Yahoo Pipes is closing down at the end of September 2015, those with an RSS mixing pipe at Yahoo might be interested in this offshoot of the MailChimp service. And, being from a big company like MailChimp, it may be more reliable than other similar services.

To test it I popped in the RSS feed from each of Elsevier’s hybrid OA humanities-ish journals. There are not too many such journals offering OA, among their thousands of science journals, only about 15 titles or so. I assume that the resulting Chimp-tastic combo-feed captures all the OA articles currently published in these titles, though unfortunately it doesn’t re-sort by date with the most recent first.

I then used feed2js to get the combo-feed results onto a static HTML page [now removed]. The feed’s content came in with date, authors, title, abstract, and journal title. It also passed working links to the full-text articles.

els