How to use JURN on mobile devices

07 Monday Dec 2009

Posted by futurilla in JURN tips and tricks

Google announces on-the-fly mobile-device versions of all Google Custom Search Engines. When visiting the plain vanilla version of JURN on any mobile device, you’ll now be automatically sent to the relevant mobile-optimised version.

Auto-detect language and auto-translate – all browsers should do this

07 Monday Dec 2009

Posted by futurilla in How to improve academic search, JURN tips and tricks, Spotted in the news

≈ Leave a comment

This is rather nice, and seems to have been released in the last few days. A new Chinese Language translation add-on for Firefox, where the language of the web page is auto-detected and the translation happens seamlessly within the existing page layout. There’s no messing around with tedious right-clicking, highlighting, hovering over buttons, etc. This is one of the first of many such add-ons, I would hope. Future browsers should have this built in, for all the major languages.

The only problem at present it that it’s rather too seamless. Users need a little visual flag to show when it’s been applied to a page. And perhaps a “toggle” button.

How to extract a CSV list of search-result URLs, along with their anchor titles

06 Sunday Dec 2009

Posted by futurilla in JURN tips and tricks, JURN's Google watch

≈ 7 Comments

In this simple tutorial I’ll show you how to rip a page of search result links into a .csv file, along with their link titles, using nothing more than Notepad and a simple bit of javascript.

(Update: January 2011. This tutorial superseded by a new and better one)

1) Have Google run your search in advanced mode, selecting “100 results on a page”. If you prefer Bing, choose Preferences / Results, and select “50 on a page”.

2) Run the search. Once you have your big page o’ results, just leave the page alone and save it locally — doing things like right-clicking on the links will trigger Google’s “url wrapping” behaviour on the clicked link, which you don’t want. So just save the page (In Firefox: File / Save Page As…), renaming it from search.html to something-more-memorable.html

3) Now open up your saved results page in your favourite web page editor, which will probably add some handy colour-coding to tags so you can see what you’re doing. But you can also just open it up in Notepad, if that’s all you have available. Right click on the file, and “Open with…”.

4) Locate the page header (it’s at the very top of the page, where the other scripts are), make some space in there, and then paste in this javascript script…

A hat-tip to richarduie for the original script. I just hacked it a bit, so as to output the results in handy comma-delimited form.

5) Now locate the start of the BODY of your web page, and paste in this code after the body tag…

Save and exit.

6) Now load up your modified page in your web browser (I’m using Firefox). You’ll see a new button marked “Extract all links and anchor titles as a CSV list”…

Press it, and you’ll get a comma-delimited list of all the links on the page, alongside all the anchor text (aka “link titles”), in this standard format…

Highlight and copy the whole list, and then paste it into a new Notepad document. Save it as a .csv file rather than a .txt file. You can do this by manually changing the file extension when saving a file from Notepad.

7) Now you have a normal .csv file that will open up in MS Excel, with all the database columns correctly and automatically filled (if you don’t own MS Office, the free Open Office Calc should work as an alternative). In Excel, highlight the third column (by clicking so as to highlight its top bar) , then choose “Sort and Filter” and then “A-Z”…

You’ll then be asked if you want “Expand the selection”. Agree to expansion (important!), and the column with the anchor text in it will be sorted by A-Z. Expansion means that all the columns stay in sync, when one is re-sorted like this.

Now you can select and delete all the crufty links in the page that came from Google’s “Cached”, “Similar”, “Translate this page” links, etc. These links will all have the same name, so by listing A-Z we’ve made them easy to delete in one fell swoop.

8) You’re done, other than spending a few minutes ferreting out some more unwanted results. Feel free to paste in more such results from Bing, de-duplicate, etc.

If you wanted to re-create a web page of links from the data, delete the first column of numbers, and then save. Open up your saved .csv in Notepad. Now you can do some very simple search and replace operations, to change the list back into HTML…

(Note: you can also use the excellent £20 Sobelsoft Excel Add Data, Text & Characters To All Cells add-in for complex search & replace operations in Excel)

Ideally there would be free Firefox Greasemonkey scripts, simple freeware utilities, etc, that could do all of this automatically. But, believe me, I’ve looked and there aren’t. Shareware Windows URL extractors are ten-a-penny (don’t waste good money on them, use the free URL Extractor), but not one of them also extracts the anchor text and saves the output as .csv.

Yes, I do know there’s the free Firefox addon Outwit Hub, which via its Data / Lists … option can capture URLs and anchors — but it jumbles everything in the link together, anchor text, snippet, Google gunk, etc, and so the link text requires major cleaning and editing for every link. Even with the hit-and-miss home-brew scraping filters, it’s not a reliable solution.

A marker-pen for Web pages

07 Tuesday Jul 2009

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

I often see students who have a pile of hard-copy print-outs from the Web, and they’ve used a yellow or green highlighter pen to mark useful paragraphs or phrases. What if they could do the same for Web pages? There’s a Firefox addon, Wired-Marker that does just that…

Wired-Marker is a … highlighter that you use on Web pages. The highlighter, which comes in various colors and styles, is a kind of electronic bookmark that serves as a guide when you revisit a Web page. The highlighted content is automatically recorded in a scrapbook and saved. … the highlighted sections remain visible on the page when you revisit … Wired-Marker is freeware … sponsored by the Ministry of Education, Culture, Sports, Science and Technology” … “You can also add notes to the bookmarked items.”

Sadly, it doesn’t yet work in Firefox 3.5, and the last supported version was Firefox 3.1b2.

Update: now working with Firefox 3.5!

Free OCR for Google Book Search pages

05 Sunday Jul 2009

Posted by futurilla in JURN tips and tricks, JURN's Google watch

≈ 11 Comments

Ever wanted to take the hassle out of re-typing a short quote, found on Google Books? Free OCR is a simple online OCR application that might help.

To test it, I gave it a very unpromising bit of text captured from Google Books using a standard screen-capture utility — slightly skewed, slightly fuzzy, in a non-standard typeface I’m willing to bet no-one has on their system, captured as a JPG at a mere 72 dpi, and just 500 pixels wide…

ocr-test

A few seconds after uploading, it gave me this…

ADVERTISEMENT.
Tms publication of the Works of Jomv KNOx, it is
supposed, will extend to F’ive Volumes. It was thought
advisable to commence the series with his History of
the Reformation in Scotland, as the work of greatest
importance. The next voliune will thus contain the
Third and Fourth Books, which continue the History to
the year 1564; at which period his historical labeurs
maybeconsideredtoterminate. ButtheFi&hBook,
forming a sequel to the History, and published under
his name in 1644, will also be included. His Letters
and Miscellaneous Writings will be arranged in the
subsequent volumes, as nearly as possible in chronolo-
gical order; each portion being introduced by a separate
notice, respecting the manuscript or printed copies from
which they have been taken.
It may perhaps be expected that a Life of the Author
should have been prefixed to this volume. The Life of
Knox., by Ds. M‘Cms, is however a work so universally
, known, and of so much historical value, as to supersede
l any attempt that might be made for a detailed bio-

Not perfect, but not bad for such a poor-quality capture. Stand-alone OCR software usually demands a much better quality source.

The popular screenshot software HyperSnap v6 promises to do the same with its TextSnap feature, but for some unknown reason this feature just doesn’t work with Google Books or the captured image above. I suspect it can only handle text that uses system fonts.

So until we get a neat free OCR Firefox addon (which is a direction I would urge the makers of Free OCR to go in) then screenshot – save image – upload image to Free OCR is a viable and speedy workflow for OCR-ing fair-use quotes found on Google Book Search or other places that only offer plain page-scans.

Oh, and don’t bother doing this for books that are already in the public domain — since last month Google provides the full-text of these for download, and also serves it up via Google Book Search Mobile.

** Update: If you have Microsoft Office 2007 or higher, then I find that the included Microsoft OneNote works just as well for OCR on low-res images such as the one above. It also works well on most PDFs that don’t allow copy/paste. See the comments to this post for details.

intitle: works with Google Blogs search

05 Sunday Jul 2009

Posted by futurilla in JURN tips and tricks, JURN's Google watch

≈ Leave a comment

Here’s a useful tip for those who want better precision while wading through the Google Blog search “blog bog”. The search modifier intitle: works with Google Blog Search.

Firefox 3.5 – how to turn off Google Suggest

30 Tuesday Jun 2009

Posted by futurilla in JURN tips and tricks

≈ 2 Comments

The horrible Google Suggest feature (now even worse because it’s started including “Sponsored links in suggestions”) returns in the long-awaited Firefox 3.5 final, now available for download. That’s because the FF Blocksite addon, previously so useful for rooting out Google Suggest, does not work in 3.5.

Search veterans who want to turn off Google Suggest should instead use the Adblock Plus addon. In Options / My Adblocking Rules, simply block the domain…

clients1.google.com

This is the server that handles Google’s suggestion keywords. Treating it as adware — which indeed it has now become — turns off Suggest.

Google’s numrange modifier

26 Friday Jun 2009

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

Here’s a potentially useful search tip for JURN users. Whereas Google’s numrange search modifer doesn’t work with Google Scholar, it does work with JURN. So, for instance…

Mercia Wulfhere 600..700

…will get you articles and book chapters about Wulfhere of Mercia, filtering for pages where the text contains any date between the years 600 and 700 A.D.