• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Category Archives: How to improve academic search

Competition for the Google CSE

04 Tuesday Jan 2011

Posted by futurilla in How to improve academic search, JURN's Google watch, Spotted in the news

≈ 1 Comment

IndexTank, custom search in a box. Nice idea. But it seems to be aimed at individual business looking to reduce their IT overheads, and is useless as a replacement for a Web-wide Google CSE…

“IndexTank doesn’t actively fetch data from you as a web crawler would do. Instead, your application sends IndexTank the data as soon as it is created or updated”

“not a standalone web search engine, and we don’t currently have a way for you to set it up directly through the Web. It requires downloading software such as a WordPress plugin (if you wanted to add better search to your blog, for example) or writing a program to interact with our servers.”

Worse, it can’t even auto-extract indexable text from the PDFs you send it…

“IndexTank, like other full-text search alternatives, indexes only text. However, for common formats like PDF or Word, it is very easy to parse them to obtain the readable text by using open source tools.”

I should mention some of the other ‘sort-of’ search-in-a-box options.

* The old and vulnerable (in the light of the Delicious closure) Yahoo BOSS

* Spinn3r. But it can only supply “A-list” blog content (so possibly not much use for hyperlocal indexing of a city-region), and you have to build your own widget to hook into its API.

* 80 Legs is a pricey monthly-subscription web-crawler. I’m uncertain if their stated ‘URL limit’ refers to the number of URLs on the originating site-list, or the number of files actually found by their crawler. If it’s the latter, you could run out of space very fast.

* And of course the new Blekko, which lets you upload a text file full of your selected URLs, and then uses them to create a ‘slashtag’ that delimits people’s searches. The last one is interesting, and I might eventually have a play around with it. Although possibly that’ll be when you’re no longer limited to 1,000 URLs, and are allowed to use wildcards in the URL list.

It’s great to see some competition emerging to Google CSEs, and perhaps it will eventually spur Google into offering a commercial ‘Deep’ Web-wide version of the Custom Search Engine:— full-text deep indexing of all the documents found at any website it’s pointed at; all the documents found are drawn on to produce your custom search results, every time; and the user gets 12,000 URLs to play with. Or perhaps Microsoft Bing will offer such a service. It might be limited to non-profits, so as to keep the SEO spivs out.

Ropey repositories

30 Tuesday Nov 2010

Posted by futurilla in How to improve academic search, My general observations

≈ Leave a comment

It’s always been annoying that academic repositories jumble together paywall / no-access / open access material, and don’t allow users to search only for open access + full-text materials. With a very few honourable exceptions, it’s a ridiculous situation — and the so-called library professionals involved in the development of such ‘standards’ should be hanging their heads in shame. Bibliographic Wilderness agrees…

“Really, I’m deeply disappointed that this kind of thing — good metadata that will allow software to know if an item really is OA, and to get a link directly to the content as well as the landing page — doesn’t seem to be a concern of the repository communities. This has been a problem for YEARS, and if any of the various organizations involved in this stuff are even making any efforts to address it, I haven’t heard about it.”

Full text vs. abstracts

27 Saturday Nov 2010

Posted by futurilla in How to improve academic search, Spotted in the news

≈ Leave a comment

Jimmy Lin’s “Is searching full text more effective than searching abstracts?“. Conclusion…

“Users searching full text are more likely to find relevant articles than searching only abstracts.”

How to import/export a list of banned URLs from the Google Noise Reduction script, for Firefox + GreaseMonkey.

20 Saturday Nov 2010

Posted by futurilla in How to improve academic search, JURN tips and tricks, JURN's Google watch

≈ 2 Comments

You may have spent some time building up a list of banned URLs for the Firefox addon Surfclarity, which strips unwanted domains from Google Search Results. Surfclarity no longer works with the latest Google changes, but the Greasemonkey script Google Noise Reduction does. In this tutorial we’ll swop the Surfclarity blacklist into the Google Noise Reduction blacklist.

1. In Firefox’s address bar, type: about:config.

2. Scroll down to extensions.surfclarity.patterns

Double click on the line of banned URLs you’ll find there, and copy them to Notepad.

3. Scroll further down to greasemonkey.scriptvals.http://exego.net//Google Noise Reduction.blacklist and take a look at the format. Note that it’s a little different than Surfclarity…

({‘britannia.com’:true, ‘oxfordjournals.org’:true, ‘tandf.co.uk’:true, ‘ingentaconnect.com’:true, ‘sagepub.com’:true, ‘myspace.com’:true, ‘experts-exchange.com’:true})

So we’re going to have to do some basic search-and-replace on our Surfclarity blacklist. Back up the Google Noise Reduction.blacklist if you want, as we’re going to overwrite it in a few moments.

4. Go back to Notepad and look at the list of Surfclarity URLs you just copied out.

Search for : and replace with : ‘ — note the space after the “:”.

Then search for : and replace it with ‘:true,

Now add ({‘ to the very start of this list, and ‘:true}) to the very end of this list.

Congratulations, you now have your SurfClarity list in Google Noise Reduction format.

5. Copy your new list to the clipboard, go back to greasemonkey.scriptvals.http://exego.net//Google Noise Reduction.blacklist, clear what’s in there at the moment, and then paste the new list in. You’re done.

Obviously, you can now also copy a backup of the Google Noise Reduction.blacklist

Carrot 2

16 Saturday Oct 2010

Posted by futurilla in How to improve academic search

≈ Leave a comment

Carrot2 is an open source software for finding thematic clusters in groups of documents…

“It can automatically organize [and label] small collections of documents, e.g. search results, into thematic categories. Apart from two specialized document clustering algorithms, Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, Google Desktop and more.”

“Carrot2 came about as a framework for building search-results clustering engines but its algorithms should successfully cluster up to about a thousand text documents, a few paragraphs each”

Scholarly Publishing through Open Access: A Bibliography

10 Sunday Oct 2010

Posted by futurilla in Economics of Open Access, How to improve academic search, Official and think-tank reports, Open Access publishing

≈ Leave a comment

A comprehensive new 2010 bibliography, Transforming Scholarly Publishing through Open Access: A Bibliography.

“…has over 1,100 references, provides in-depth coverage of published journal articles, books, and other works about the open access movement. Many references have links to freely available copies of included works.”

Association of Learned and Professional Society Publishers – 2010 proceedings

23 Thursday Sep 2010

Posted by futurilla in How to improve academic search

≈ Leave a comment

A set of podcasts and Powerpoint slides the Sept 2010 conference of the Association of Learned and Professional Society Publishers.

Including:

* The Seven Crises of Scholarly Publishing : extinction or evolution?
* The Library : the best place for information research?
* Needles in a Virtual Haystack : discoverability as a route to market

ScholarLynk

08 Wednesday Sep 2010

Posted by futurilla in How to improve academic search

≈ Leave a comment

Details of a new prototype tool from Microsoft Research: ScholarLynk…

“ScholarLynk is a desktop solution aiming to support researchers in building and maintaining ‘reading lists’ of resources in collaboration with other researchers […] tools for (i) constructing reading lists by tagging the desired resources, (ii) seamlessly incorporating remote data sources as desktop resources, and (iii) supporting in-context communication, sharing of reading lists, and collaboration with other users of the ScholarLynk.

The prototype implementation leverages the DRIVER Infrastructure for European Open Access [repository] publications that currently comprises 2,500,000 publication records from over 250 repositories world wide.”

Semantic enterprise startups – the survey

17 Tuesday Aug 2010

Posted by futurilla in How to improve academic search

≈ Leave a comment

A four-part survey on current semantic enterprise startups.

InCite

07 Wednesday Jul 2010

Posted by futurilla in How to improve academic search, JURN's Google watch, Spotted in the news

≈ Leave a comment

New on Google Scholar, search within all the papers that cite the one you’re interested in.

← Older posts
Newer posts →
RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • February 2026
    • January 2026
    • October 2025
    • May 2025
    • April 2025
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.