Google Caffeine

11 Tuesday Aug 2009

Posted by futurilla in JURN's Google watch

Google is apparently set for a massive jolt soon — of Caffeine.

“Web search” links vanish from Google Scholar

15 Wednesday Jul 2009

Posted by futurilla in Academic search, JURN's Google watch

There seem to be some changes going on at Google Scholar, under the hood. First the RefWorks functionality recently vanished, and now all “Web search” links have vanished. Scholar used to place “Web search” under many links, offering a one-click method of searching the Web for the title/author in the hope of finding the full-text or a commentary. Now we have to manually copy & paste.

Google Scholar & More

13 Monday Jul 2009

Posted by futurilla in Academic search, JURN's Google watch

≈ Leave a comment

I found another recent book on Google Scholar — Google Scholar & More: New Google Applications & Tools For Libraries (Routledge, Oct 2008). It originally sold for a whopping $150.00, but Amazon has 26 used copies from $37. And, oddly, there seem to be not a single review to be freely found online, not even on the Amazon U.K. or U.S. pages for the book. So I’m not sure what all that says about the book’s usefulness, but I thought I’d mention it here for those who may be interested that it can now be had cheap on Amazon.

Google Images now sort-of CC-searchable

11 Saturday Jul 2009

Posted by futurilla in JURN's Google watch

≈ Leave a comment

Google Images just introduced the ability to search for images tagged with usage rights…

google-images-filter

Free OCR for Google Book Search pages

05 Sunday Jul 2009

Posted by futurilla in JURN tips and tricks, JURN's Google watch

≈ 11 Comments

Ever wanted to take the hassle out of re-typing a short quote, found on Google Books? Free OCR is a simple online OCR application that might help.

To test it, I gave it a very unpromising bit of text captured from Google Books using a standard screen-capture utility — slightly skewed, slightly fuzzy, in a non-standard typeface I’m willing to bet no-one has on their system, captured as a JPG at a mere 72 dpi, and just 500 pixels wide…

ocr-test

A few seconds after uploading, it gave me this…

ADVERTISEMENT.
Tms publication of the Works of Jomv KNOx, it is
supposed, will extend to F’ive Volumes. It was thought
advisable to commence the series with his History of
the Reformation in Scotland, as the work of greatest
importance. The next voliune will thus contain the
Third and Fourth Books, which continue the History to
the year 1564; at which period his historical labeurs
maybeconsideredtoterminate. ButtheFi&hBook,
forming a sequel to the History, and published under
his name in 1644, will also be included. His Letters
and Miscellaneous Writings will be arranged in the
subsequent volumes, as nearly as possible in chronolo-
gical order; each portion being introduced by a separate
notice, respecting the manuscript or printed copies from
which they have been taken.
It may perhaps be expected that a Life of the Author
should have been prefixed to this volume. The Life of
Knox., by Ds. M‘Cms, is however a work so universally
, known, and of so much historical value, as to supersede
l any attempt that might be made for a detailed bio-

Not perfect, but not bad for such a poor-quality capture. Stand-alone OCR software usually demands a much better quality source.

The popular screenshot software HyperSnap v6 promises to do the same with its TextSnap feature, but for some unknown reason this feature just doesn’t work with Google Books or the captured image above. I suspect it can only handle text that uses system fonts.

So until we get a neat free OCR Firefox addon (which is a direction I would urge the makers of Free OCR to go in) then screenshot – save image – upload image to Free OCR is a viable and speedy workflow for OCR-ing fair-use quotes found on Google Book Search or other places that only offer plain page-scans.

Oh, and don’t bother doing this for books that are already in the public domain — since last month Google provides the full-text of these for download, and also serves it up via Google Book Search Mobile.

** Update: If you have Microsoft Office 2007 or higher, then I find that the included Microsoft OneNote works just as well for OCR on low-res images such as the one above. It also works well on most PDFs that don’t allow copy/paste. See the comments to this post for details.

intitle: works with Google Blogs search

05 Sunday Jul 2009

Posted by futurilla in JURN tips and tricks, JURN's Google watch

≈ Leave a comment

Here’s a useful tip for those who want better precision while wading through the Google Blog search “blog bog”. The search modifier intitle: works with Google Blog Search.

Basic name authority added to Google News

26 Friday Jun 2009

Posted by futurilla in JURN's Google watch

≈ 2 Comments

Google News has just introduced a new feature to find articles written by someone, rather than about someone…

“If you spot an article by a specific journalist, you can click their name to bring up other articles they’ve written.”

With this and the Google News RSS feed, it’s now possible to set up a simple news feed for new articles from your fave journalists. Possibly in the elegant Firefox addon Feedly. Don’t forget to click “sort by date” before you grab the feed.

And since you can plug RSS feeds into pages, you could now set up a public Daily Something page, cut out the churnalist press-releases and just have a select band of top specialist journalists effectively writing for you. This is really going to annoy the newspaper publishers.

And you can also type a simple search modifier into the Google News search-box, e.g.:

author:”Matthew Parris”

And this can be combined with the source: modifier…

source:Washington_Times

The perils of long article titles

23 Tuesday Jun 2009

Posted by futurilla in Academic search, JURN tips and tricks, JURN's Google watch

≈ 2 Comments

Here’s a useful tip: Google’s intitle: search modifier only works if the search-results title/link uses the phrase. It seems that Google is not reading the article title from your metadata, but instead reading it from the links on a larger ‘upstream’ set of search results pages. For instance, searching for intitle:”The Searchers” Ford will not pick up…

“Home on the Range: Space, Nation, and Mobility in John Ford’s The Searchers“
from The Japanese Journal of American Studies, No. 13 (2002)

…because the article appears in search results as…

As you can see, “The Searchers” has dropped off the end of the link to be replaced with three dots. So using intitle: doesn’t find it.

Article titles should be around 50 characters or less (inc. spaces), to fit comfortably on a Google link. Or a 500-pixel width blog column, for that matter.

Google Scholar is more forgiving, only hitting the same problem at around 100 characters. But JURN works like the main Google, and so users should be aware of the difference.

A review of three major academic search-engines

22 Monday Jun 2009

Posted by futurilla in Academic search, JURN's Google watch

≈ Leave a comment

Following my own group-test, it’s interesting to see that Peter at Gale Reference Review has just published a detailed May 2009 review of three major academic search-engines. He takes a skeptical look at Web of Science (WoS), Scopus and Google Scholar. The article is rather long, but here are some interesting quotes…

“Google Scholar […] reports implausibly high citedness counts for most items, which becomes quite obvious when tracing the purportedly citing papers”

“I looked at the widely touted figures in the promotional materials [ of WoS and Scopus and found ] they should not be taken for granted. Many of these are incorrect and exaggerated. Their compilation has been fast and loose, sometimes making them fiction rather than fact.”

“The coverage of arts & humanities [ in Scopus ] is extremely poor (representing barely 1% of the database) [ and by comparison ] Web of Science has about […] 10 times as many for arts & humanities.” [ and even if Scopus gets a boost, as proposed, it would still only have ] about 1/6th of what Web of Science has for these disciplines”

“It is one thing that Scopus has no cited references in records for papers published before 1996, but it adds insult to injury that the pre-1996 papers are ignored. This results in absurdly low h-index for many of the senior teaching and research faculty members and independent researchers who published papers well before 1996 which have been widely cited in the past 25-35 years […] Lazy administrators and bureaucrats stop here and ignore [ worthy people ] for some lifetime award”

Fab new additions today to Google Book Search

19 Friday Jun 2009

Posted by futurilla in JURN's Google watch

≈ Leave a comment

Some fab new additions to Google Book Search:

* A drop-down menu to navigate directly to a chapter.

* A YouTube-like “embed this book” code snippet.

* Sort search results by “relevance”, as well as page order.

* Expanded Book Overview page, with reviews and more keywords.

There are a few more additions, only applying to public-domain books.

Interestingly the new contents listing doesn’t seem to wholly rely on a table-of-contents, since Google apparently has a new “structure extraction technology” which is being added to the mix.

News from JURN

~ search tool for open access content

Category Archives: JURN's Google watch

Google Caffeine

“Web search” links vanish from Google Scholar

Google Scholar & More

Google Images now sort-of CC-searchable

Free OCR for Google Book Search pages

intitle: works with Google Blogs search

Basic name authority added to Google News

The perils of long article titles

A review of three major academic search-engines

Fab new additions today to Google Book Search