{"id":2603,"date":"2009-07-05T16:01:36","date_gmt":"2009-07-05T16:01:36","guid":{"rendered":"http:\/\/jurnsearch.wordpress.com\/?p=2603"},"modified":"2009-07-05T16:01:36","modified_gmt":"2009-07-05T16:01:36","slug":"free-ocr-for-google-book-search-pages","status":"publish","type":"post","link":"https:\/\/jurn.link\/jurnsearch\/index.php\/2009\/07\/05\/free-ocr-for-google-book-search-pages\/","title":{"rendered":"Free OCR for Google Book Search pages"},"content":{"rendered":"<p>Ever wanted to take the hassle out of re-typing a short quote, found on Google Books?  <a href=\"http:\/\/www.free-ocr.com\/\">Free OCR<\/a> is a simple online OCR application that might help.  <\/p>\n<p>To test it, I gave it a very unpromising bit of text captured from Google Books using a standard screen-capture utility &mdash; slightly skewed, slightly fuzzy, in a non-standard typeface I&#8217;m willing to bet no-one has on their system, captured as a JPG at a mere 72 dpi, and just 500 pixels wide&#8230;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/jurn.link\/jurnsearch\/2009\/07\/ocr-test.jpg\" alt=\"ocr-test\" title=\"ocr-test\" width=\"500\" height=\"641\" class=\"alignnone size-full wp-image-2602\" \/><\/p>\n<p>A few seconds after uploading, it gave me this&#8230;<\/p>\n<blockquote><p>ADVERTISEMENT.<br \/>\nTms publication of the Works of Jomv KNOx, it is<br \/>\nsupposed, will extend to F&#8217;ive Volumes. It was thought<br \/>\nadvisable to commence the series with his History of<br \/>\nthe Reformation in Scotland, as the work of greatest<br \/>\nimportance. The next voliune will thus contain the<br \/>\nThird and Fourth Books, which continue the History to<br \/>\nthe year 1564; at which period his historical labeurs<br \/>\nmaybeconsideredtoterminate. ButtheFi&amp;hBook,<br \/>\nforming a sequel to the History, and published under<br \/>\nhis name in 1644, will also be included. His Letters<br \/>\nand Miscellaneous Writings will be arranged in the<br \/>\nsubsequent volumes, as nearly as possible in chronolo-<br \/>\ngical order; each portion being introduced by a separate<br \/>\nnotice, respecting the manuscript or printed copies from<br \/>\nwhich they have been taken.<br \/>\nIt may perhaps be expected that a Life of the Author<br \/>\nshould have been prefixed to this volume. The Life of<br \/>\nKnox., by Ds. M\u2018Cms, is however a work so universally<br \/>\n, known, and of so much historical value, as to supersede<br \/>\nl any attempt that might be made for a detailed bio-<\/p><\/blockquote>\n<p>Not perfect, but not bad for such a poor-quality capture. Stand-alone OCR software usually demands a much better quality source.<\/p>\n<p>The popular screenshot software HyperSnap v6 promises to do the same with its TextSnap feature, but for some unknown reason this feature just doesn&#8217;t work with Google Books or the captured image above.  I suspect it can only handle text that uses system fonts.<\/p>\n<p>So until we get a neat free OCR Firefox addon (which is a direction I would urge the makers of <a href=\"http:\/\/www.free-ocr.com\/\">Free OCR<\/a> to go in) then <strong>screenshot &#8211; save image &#8211; upload image to <a href=\"http:\/\/www.free-ocr.com\/\">Free OCR<\/a><\/strong> is a viable and speedy workflow for OCR-ing fair-use quotes found on Google Book Search or other places that only offer plain page-scans.<\/p>\n<p>Oh, and don&#8217;t bother doing this for books that are already in the public domain &mdash; since last month Google provides the full-text of these for download, and also serves it up via <a href=\"http:\/\/books.google.com\/m\">Google Book Search Mobile<\/a>.<\/p>\n<p>&nbsp;&nbsp;&nbsp;<strong>** Update<\/strong>: If you have <a href=\"http:\/\/www.amazon.co.uk\/gp\/product\/B000HCZ8EO?ie=UTF8&amp;tag=httwwwdloginf-21&amp;linkCode=as2&amp;camp=1634&amp;creative=6738&amp;creativeASIN=B000HCZ8EO\">Microsoft Office 2007<\/a><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.assoc-amazon.co.uk\/e\/ir?t=httwwwdloginf-21&amp;l=as2&amp;o=2&amp;a=B000HCZ8EO\" width=\"1\" height=\"1\" border=\"0\" alt=\"\" style=\"border:none !important;margin:0!important;\" \/> or higher, then I find that the included Microsoft OneNote works just as well for OCR on low-res images such as the one above.  It also works well on most PDFs that don&#8217;t allow copy\/paste. See the comments to this post for details.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ever wanted to take the hassle out of re-typing a short quote, found on Google Books? Free OCR is a &hellip;<\/p>\n<p><a href=\"https:\/\/jurn.link\/jurnsearch\/index.php\/2009\/07\/05\/free-ocr-for-google-book-search-pages\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8,9],"tags":[],"class_list":["post-2603","post","type-post","status-publish","format-standard","hentry","category-jurn-tips-and-tricks","category-jurns-google-watch"],"_links":{"self":[{"href":"https:\/\/jurn.link\/jurnsearch\/index.php\/wp-json\/wp\/v2\/posts\/2603","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jurn.link\/jurnsearch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jurn.link\/jurnsearch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jurn.link\/jurnsearch\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jurn.link\/jurnsearch\/index.php\/wp-json\/wp\/v2\/comments?post=2603"}],"version-history":[{"count":0,"href":"https:\/\/jurn.link\/jurnsearch\/index.php\/wp-json\/wp\/v2\/posts\/2603\/revisions"}],"wp:attachment":[{"href":"https:\/\/jurn.link\/jurnsearch\/index.php\/wp-json\/wp\/v2\/media?parent=2603"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jurn.link\/jurnsearch\/index.php\/wp-json\/wp\/v2\/categories?post=2603"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jurn.link\/jurnsearch\/index.php\/wp-json\/wp\/v2\/tags?post=2603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}