Open Marginalis

20 Thursday Aug 2015

Posted by futurilla in Spotted in the news

Open Marginalis, medieval marginalia in open access.

Documenting traps

20 Thursday Aug 2015

Posted by futurilla in Spotted in the news

A new guide “to defeating tracking traps that could identify document leakers”, such as academic journal articles from behind paywalls…

Harding’s popular lecture detailed the watermarking and metadata techniques used to identify works and listed tools that can identify and circumvent both mechanisms.”

“SFX Miscellaneous Free Ejournals Target”

19 Wednesday Aug 2015

Posted by futurilla in Academic search, Economics of Open Access, Spotted in the news

≈ 1 Comment

“SFX Miscellaneous Free Ejournals Target: Usage Survey Among the SFX Community“, Serials Review (2015), 41(2), pp. 58-68.

SFX is an OpenURL link resolver product for university libraries, focussed on the output of traditional publishers — of which 16-20% is apparently so dodgy in terms of quality that it breaks the system. Yet, rather amazingly, it appears that much of this 16-20% is still allowed to get to the point-of-use.

The article briefly surveys recent findings on how SFX copes with open access articles, and then the rest of the paper gives the results of a survey of librarians who integrate a specific ‘free’ section of SFX with their library discovery tools. It appears that scholars looking for open free full-text via SFX can expect way over 20% dead link errors on URLs…

… one category [of failure] (incorrect parse params) alone leads to 20% false positives (dead links) for MFE [the largest ‘free’ target in SFX]. Besides incorrect parse params, there are numerous other reasons for the occurrence of false positives (dead links), such as resolver translation error, inaccurate embargo data, provider target URL translation error, incomplete provider content, wrong coverage dates, indexed-only titles mistakenly considered as fulltext titles, and other reasons listed in the literature review section.”

So that might mean… perhaps 40% of links to open access full-text are dead? Or even more, like… 60%? The article doesn’t hazard a guess.

The DOAJ ‘targets’ are apparently not much better…

It’s an irony that I find discovery services generally have much poorer coverage of Open Access than Google Scholar. … Most discovery services have indexed DOAJ (Directory of Open Access Journals), but many libraries experience such a bad linking experience they just turn it off” — Aaron Tay, July 2015.

I’m pleased to say that JURN should have close to zero dead links on standalone journals, due to the way it is set up. JURN may lead to a few fleeting “server maintainance” / “timeout” errors here and there, but if the journal’s base URL for articles moves then its articles effectively get auto-removed from JURN’s results. But they get found again within a year at most, through an effective two-pronged method.

AWOL releases cleaned A-Z URL list.

18 Tuesday Aug 2015

Posted by futurilla in Economics of Open Access, My general observations, Spotted in the news

≈ Leave a comment

AWOL has a fascinating post today. It’s on the attempts to identify which AWOL linked resources have already been ingested into major long-term Web archives, and which haven’t. As part of that experiment Charles and his helpmate Ryan have offered their readers a nice big cleaned A-Z list of the “52,020 unique URLs” linked from AWOL, which is very good of them. I might clip these URLs back and de-duplicate, and then do a side-by-side sheet with JURN’s own indexing URLs and thus see what’s missing from JURN. Very little in terms of post-1945 journal articles, I suspect, though there may be some I’ve missed.

Of course a JURN Search already runs across the AWOL pages, as well as a great many of the post-war full-text originals (via Google). But if I were an Ancient History scholar I might now be tempted to get together with others to crowdfund a mass download of AWOL’s full-text, so that I could search across the full-text locally and minutely, without having to rely on Google etc. I reckon the entire set of AWOL full-text would fit on a 1.5Tb external drive and would cost around $10,000 to harvest by hand/eye. Why would that be needed? I’m assuming that many long-term Web archives are ‘dark’ or that license complications mean no single archive can ingest the entirety of what AWOL points to.

My calculations for the $10k figure start with the fact that a little over 10,000 of AWOL’s 52,020 URLs are straight-to-PDF links, and so very easily downloaded by a harvesting bot. Assuming an average of 5Mb per PDF, that means about 260Gb of disk storage space for those PDFs.

If one then assumes that perhaps 10,000 of the URLs are not going to articles (rather to such things as sites that show scans of original source manuscripts and old books that display in zoomable and frame-nested forms etc, huge datasets, that are difficult to extract and archive), then that might leave 32,000 URLs that are mostly likely to be links to either journal TOCs pages or individual articles.

Let’s assume that each of the 32,000 TOC page URLs lead to an average of 16 articles and reviews (though some 2,000 may be home-page links sitting above links to issue TOCs). So 32,000 = 512,000 articles of some kind, in PDF or HTML, on average weighing 1.5Mb each. So that’s 768Gb in total. In that case one might easily store all the AWOL-discovered full-text on an $80 1.5Tb external disk, and have space to spare for the desktop indexing software‘s own index, which would be fairly big. That is a product that I might find very useful, if I were an Ancient History student, specialist, or independent scholar without access to university databases.

But how to harvest those 512,000 articles? The brute force way would be to parcel up the 32,000 URLs into parcels of 150 each. That’s 230 parcels x 150 URLs. If one were paying 20 cents per URL to Indian freelancers, to go in and spend 3 minutes grabbing whatever articles are hanging off each of those 150 page URLs, plus the page, then that would cost $37 per parcel. Let’s say $40, with a small quality bonus. Let’s say it takes four hours to do the 150 URLs and not miss anything. So that’s $10 U.S. a hour — pretty good for an Indian freelancer with broadband, I don’t think anyone would be being exploited on that deal. So the whole 32,000 URL set would cost $9,200 to harvest by hand and eye, which seems well within the range of a small crowdfunding campaign.

Of course, it might be that the articles could be wholly or partly harvested by bot. But I suspect that a simple “page + anything it links to” harvest would bring in a lot of chaff alongside the articles, given the very varied and non-standard nature of what AWOL links to. Perhaps that wouldn’t matter in practice, when keyword searching across the entire harvest. Or one might be able to use a more intelligent bot, one using Google Scholar-like article-detection algorithms.

Element Hiding Helper updates, changes

16 Sunday Aug 2015

Posted by futurilla in JURN tips and tricks, Spotted in the news

≈ Leave a comment

AdBlock Plus’s Element Hiding Helper has updated. It no longer resides on the right-click mouse menu. You need to enable the top menu bar button for AdBlock (View|Toolbars|Customise), then it launches from a drop-down from that icon.

The new method of selecting a block to hide takes a minute of getting used to. If you can comprehend nested HTML code at a glance then it’s not necessarily easier than before, since it’s now trickier to identify the master container DIV for the whole block you want to hide. However, other users will probably find it a bit easier and more visual to use.

Element Hiding Helper is useful for customising “noisy” websites such as newspaper front pages, which blast you with celebrity news sidebars, scrolling tickers, sports sections and other regular items you never read.

Retraction Watch boosted by $400,000 grant

04 Tuesday Aug 2015

Posted by futurilla in Spotted in the news

≈ Leave a comment

Retraction Watch has been given a $400,000 grant from the John D. and Catherine T. MacArthur Foundation, “to create a comprehensive database of retractions, allowing us to hire our first staff writer”.

Depending on the form it takes this could potentially be indexed by JURN? It would have to be one retraction, one page, and have the OA status indicated in the URL path — www.database.fuz/articles/oa/article725.html

Microsoft Academic Graph

19 Sunday Jul 2015

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

Microsoft Academic Graph…

“The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals and conference “venues” and fields of study. This data is available as a set of zipped text files … The file size is ~37GB.”

Open Library of Humanities

10 Friday Jul 2015

Posted by futurilla in Spotted in the news

≈ Leave a comment

Open Library of Humanities has just landed a grant for $741,000…

“Birkbeck, University of London has been awarded a three-year grant of $741,000 from the Andrew W. Mellon Foundation to cement and expand a new model for open-access publishing in the humanities disciplines.”

This is to be centred on making a peer-reviewed open access…

“mega-journal, multi-journal and books platform for the humanities”, “with no author-facing charges”.

No full-text search and access at the OLH, as yet. But it looks like many major university libraries are signing up in principle.

The Public Impact of Latin America’s Approach to Open Access

09 Thursday Jul 2015

Posted by futurilla in Spotted in the news

≈ Leave a comment

Juan Pablo Alperin, “The Public Impact of Latin America’s Approach to Open Access”, June 2015.

“It is evident that the degree of adoption of the OA models is fairly extensive [in Latin America], although there are no exact figures. … The highest estimate, although not based on a rigorous study, comes from the director of SciELO, an expert in scholarly communications in Latin America, who suggests that 95% of all online journals published within the region are fully OA. Unfortunately, none of the databases that collect subscription information provide an adequate sample from which to gather a more exact estimate.”

“The underlying assumption, found repeatedly in the OA literature, is that the OA portals in Latin America are seen as contributing to “development” by extending the readership and circulation of Latin American research, thereby connecting them to a global “system of science” [but until now] nobody has attempted to verify the underlying assumption that there is interest from a broader community of readers in accessing research from developing regions.”

DOAJ removes all SCIRP journals

06 Monday Jul 2015

Posted by futurilla in Spotted in the news

≈ Leave a comment

The DOAJ has removed a very large list of journal titles from the publisher SCIRP (Scientific Research Publishing Inc.), along with the titles of SCIRP’s Chinese associate publisher Hans Publishers Inc., citing alleged “Editorial misconduct” from both publishers.

Neither publisher was directly indexed in JURN.

News from JURN

~ search tool for open access content

Category Archives: Spotted in the news

Open Marginalis

Documenting traps

“SFX Miscellaneous Free Ejournals Target”

AWOL releases cleaned A-Z URL list.

Element Hiding Helper updates, changes

Retraction Watch boosted by $400,000 grant

Microsoft Academic Graph

Open Library of Humanities

The Public Impact of Latin America’s Approach to Open Access

DOAJ removes all SCIRP journals