Wikipedia increasingly citing journals

07 Tuesday Jul 2009

Posted by futurilla in Spotted in the news

New figures on the number of links to journals from Wikipedia pages. Although I’m not sure I’d call Autocar or the New York Times academic journals, most of the titles look like major(?) medical journals and can thus be assumed to be peer-reviewed.

Battle of Waterloo

07 Tuesday Jul 2009

Posted by futurilla in Ooops!

≈ Leave a comment

Oops! Found on the library blogs today…

“ASME failed to invoice us, hence did not get paid and they cut us off. This means all ASME ejournals are cut off, even the years we’ve paid for! … The problem is being worked on but it may be up to a week to get fixed.”

Archival-quality POD?

06 Monday Jul 2009

Posted by futurilla in Spotted in the news

≈ Leave a comment

Nice to see that someone is thinking of posterity…

“the Bodleian Library will make at least one existing run of print titles of the e-journals it acquires”

ClearType gives a “5% speed improvement in reading”

05 Sunday Jul 2009

Posted by futurilla in Spotted in the news

≈ Leave a comment

Those who care about clarity when reading from screens, may like to read this long account from a Microsoft engineer of the improvements to ClearType in the new Windows 7, including a new ClearType Tuner. ClearType is absolutely vital if you want to use your laptop as if it were an ebook reader.

I’ve used the Tuner (my desktop is now running on Windows 7 RC) and it works well, although the personalised changes it makes are very subtle. ClearType is enabled by default in Windows 7.

Those who are not going to plough through the article may still be interested to read the key findings…

* We’ve measured an improvement in word recognition accuracy of 17% using ClearType over bi-level rendering.

* We’ve found a 5% speed improvement in reading speed and a 2% improvement in comprehension (this is remarkable) using ClearType

XP and Vista have ClearType — but I don’t think it’s enabled by default in XP, and there’s no Tuner in Vista (the Tuner is an optional download). Given the figures above, it sounds like students would benefit from having a canteen “ClearType tuning surgery” for their laptops, during the Autumn term.

Oh, and there’s another nice if rather minor benefit for Windows 7 users. W7 comes with a native standalone XPS reader, XPS being the “XML paper specification” which is a competitor to PDF. Sadly the XPS reader/viewer appears to have no sample XPS documents, although you can download the official Microsoft sample pack here. It’s primitive as a reader, but unlike Acrobat Reader, XPS Reader automatically shows pages in two-up view (aka ‘facing’) when in full-screen mode. In Acrobat you need to burrow into Edit / Preferences / Full Screen / Uncheck box to “Fill screen with one page at a time” to get a two-up page display in full-screen mode.

Free OCR for Google Book Search pages

05 Sunday Jul 2009

Posted by futurilla in JURN tips and tricks, JURN's Google watch

≈ 11 Comments

Ever wanted to take the hassle out of re-typing a short quote, found on Google Books? Free OCR is a simple online OCR application that might help.

To test it, I gave it a very unpromising bit of text captured from Google Books using a standard screen-capture utility — slightly skewed, slightly fuzzy, in a non-standard typeface I’m willing to bet no-one has on their system, captured as a JPG at a mere 72 dpi, and just 500 pixels wide…

ocr-test

A few seconds after uploading, it gave me this…

ADVERTISEMENT.
Tms publication of the Works of Jomv KNOx, it is
supposed, will extend to F’ive Volumes. It was thought
advisable to commence the series with his History of
the Reformation in Scotland, as the work of greatest
importance. The next voliune will thus contain the
Third and Fourth Books, which continue the History to
the year 1564; at which period his historical labeurs
maybeconsideredtoterminate. ButtheFi&hBook,
forming a sequel to the History, and published under
his name in 1644, will also be included. His Letters
and Miscellaneous Writings will be arranged in the
subsequent volumes, as nearly as possible in chronolo-
gical order; each portion being introduced by a separate
notice, respecting the manuscript or printed copies from
which they have been taken.
It may perhaps be expected that a Life of the Author
should have been prefixed to this volume. The Life of
Knox., by Ds. M‘Cms, is however a work so universally
, known, and of so much historical value, as to supersede
l any attempt that might be made for a detailed bio-

Not perfect, but not bad for such a poor-quality capture. Stand-alone OCR software usually demands a much better quality source.

The popular screenshot software HyperSnap v6 promises to do the same with its TextSnap feature, but for some unknown reason this feature just doesn’t work with Google Books or the captured image above. I suspect it can only handle text that uses system fonts.

So until we get a neat free OCR Firefox addon (which is a direction I would urge the makers of Free OCR to go in) then screenshot – save image – upload image to Free OCR is a viable and speedy workflow for OCR-ing fair-use quotes found on Google Book Search or other places that only offer plain page-scans.

Oh, and don’t bother doing this for books that are already in the public domain — since last month Google provides the full-text of these for download, and also serves it up via Google Book Search Mobile.

** Update: If you have Microsoft Office 2007 or higher, then I find that the included Microsoft OneNote works just as well for OCR on low-res images such as the one above. It also works well on most PDFs that don’t allow copy/paste. See the comments to this post for details.

intitle: works with Google Blogs search

05 Sunday Jul 2009

Posted by futurilla in JURN tips and tricks, JURN's Google watch

≈ Leave a comment

Here’s a useful tip for those who want better precision while wading through the Google Blog search “blog bog”. The search modifier intitle: works with Google Blog Search.

Non non-destructive scanning

05 Sunday Jul 2009

Posted by futurilla in Ooops!

≈ Leave a comment

The man who ripped books…

“I have a sheet-fed scanner — a Fujitsu Scan Snap S510M [$350] — which works quickly. It handles about 20 sheets per minute, scanning both sides. A 200 page book takes about 5 minutes to scan. The problem is turning a bound book into sheets. I’ve been using a utility knife to cut the pages […] the knife only takes a few minutes. In less than 10 minutes I can reduce a bulky 2-3 pound book to a weightless file with all the typography, graphics and even the paper’s color preserved in a PDF.”

A better option, which means you can still sell the books afterwards. Or donate them to a library.

Or you could just run a $65 barcode scanner over the back of each book, keep them (“Books furnish a room”, etc) and then search a lot of your library via Google Books. Plus you get a record of your library for insurance purposes, in case the house burns down. Which in large sections of the American desert/scrubland is apparently a real possibility. I’d imagine it might be quite useful for scholars in repressive countries too, where one might suddenly have to flee the country without a personal library.

Big apple

03 Friday Jul 2009

Posted by futurilla in JURN blogged

≈ Leave a comment

JURN is reference site of the day at the Brooklyn Public Library in New York City, the fifth largest public library in the United States.

The hidden economics of Open Access

03 Friday Jul 2009

Posted by futurilla in Economics of Open Access

≈ Leave a comment

Joseph Gelfer criticises aspects of the paper “But what have you done for me lately? Commercial Publishing, Scholarly Communication, and Open-Access” (2009) by John P. Conley and Myrna Wooders, with special focus on the value that paid editors can bring in terms of polishing manuscripts.

In the second half of the post, Gelper also points out that…

“the volunteer labor on which many OA journals … are based hides the true cost of doing business. One would expect an economist to make more of this analysis, but the fact that $0 is spent on editing an OA journal does not result in zero cost. Costs come in many shapes and forms: that hour of volunteer copyediting from our editorially skilled and willing academic comes at the cost of their employer, or family, or an hour of leisure activity. … when such [OA] mandates rely on unpaid labor, they also have the potential to erase the skills of academics and publishing professionals who may otherwise reasonably demand an honest day’s pay for an honest day’s work … the glossing over of economic realities does no service to OA’s moral high-ground”

The other hidden long-term cost factor here is training. Professionals may have invested years of their life in training courses and self-learning, whereas volunteer OA editors are seemingly expected to “just know how to do it”. Not only are volunteer editors not paid (even in terms of workload allowances), they’re not paid to train for their role either.

Self-archiving after publication

02 Thursday Jul 2009

Posted by futurilla in Economics of Open Access

≈ Leave a comment

The Occasional Pamphlet (a law blog at Harvard) has a long and detailed posting on the issues around the public self-archiving of academic articles, after publication in an academic journal.

News from JURN

~ search tool for open access content

Wikipedia increasingly citing journals

Battle of Waterloo

Archival-quality POD?

ClearType gives a “5% speed improvement in reading”

Free OCR for Google Book Search pages

intitle: works with Google Blogs search

Non non-destructive scanning

Big apple

The hidden economics of Open Access

Self-archiving after publication