• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Category Archives: Academic search

filetype:pdf now working in Bing

23 Tuesday Jun 2009

Posted by futurilla in Academic search, JURN tips and tricks

≈ Leave a comment

filetype:pdf is now working in Bing, the new Microsoft search-engine. Last time I tried Bing, that search modifier was not enabled.

The perils of long article titles

23 Tuesday Jun 2009

Posted by futurilla in Academic search, JURN tips and tricks, JURN's Google watch

≈ 2 Comments

Here’s a useful tip: Google’s intitle: search modifier only works if the search-results title/link uses the phrase. It seems that Google is not reading the article title from your metadata, but instead reading it from the links on a larger ‘upstream’ set of search results pages. For instance, searching for intitle:”The Searchers” Ford will not pick up…

   “Home on the Range: Space, Nation, and Mobility in John Ford’s The Searchers“
   from The Japanese Journal of American Studies, No. 13 (2002)

…because the article appears in search results as…

goog-title

As you can see, “The Searchers” has dropped off the end of the link to be replaced with three dots. So using intitle: doesn’t find it.

Article titles should be around 50 characters or less (inc. spaces), to fit comfortably on a Google link. Or a 500-pixel width blog column, for that matter.

Google Scholar is more forgiving, only hitting the same problem at around 100 characters. But JURN works like the main Google, and so users should be aware of the difference.

A review of three major academic search-engines

22 Monday Jun 2009

Posted by futurilla in Academic search, JURN's Google watch

≈ Leave a comment

Following my own group-test, it’s interesting to see that Peter at Gale Reference Review has just published a detailed May 2009 review of three major academic search-engines. He takes a skeptical look at Web of Science (WoS), Scopus and Google Scholar. The article is rather long, but here are some interesting quotes…

“Google Scholar […] reports implausibly high citedness counts for most items, which becomes quite obvious when tracing the purportedly citing papers”

“I looked at the widely touted figures in the promotional materials [ of WoS and Scopus and found ] they should not be taken for granted. Many of these are incorrect and exaggerated. Their compilation has been fast and loose, sometimes making them fiction rather than fact.”

“The coverage of arts & humanities [ in Scopus ] is extremely poor (representing barely 1% of the database) [ and by comparison ] Web of Science has about […] 10 times as many for arts & humanities.” [ and even if Scopus gets a boost, as proposed, it would still only have ] about 1/6th of what Web of Science has for these disciplines”

“It is one thing that Scopus has no cited references in records for papers published before 1996, but it adds insult to injury that the pre-1996 papers are ignored. This results in absurdly low h-index for many of the senior teaching and research faculty members and independent researchers who published papers well before 1996 which have been widely cited in the past 25-35 years […] Lazy administrators and bureaucrats stop here and ignore [ worthy people ] for some lifetime award”

OutWit Docs

20 Saturday Jun 2009

Posted by futurilla in Academic search, How to improve academic search, JURN tips and tricks

≈ Leave a comment

Have you ever wanted to rip all the PDF and DOC files from a focussed Google or Google Scholar search, quickly save them all to a folder, index them with something powerful like dtSearch, and then search the real full-text from across all of them — rather than whatever bits the Googlebot indexed as it swept past, and whatever bits the Google Search ran its search from?

Or archive the entire run of a PDF ejournal that’s sitting at site:www.our-ejournal/articles/ ?

The new free OutWit Docs Firefox plugin does that, and works with the latest version of Firefox. There’s one major drawback — it hijacks the space right next to your browser’s Home icon, with a naff shiny 3D-stylee icon…

ugly

Unacceptable. It can however be moved after a bit of fiddling (Right-click, ‘customize’, and drag it out) and then placed somewhere a little more suitable and out-of-sight.

When using it, though, you also quickly come to appreciate why people should name their academic PDF files something_meaningful.pdf rather than xy2f6fjg00.pdf  And why filenames should have year rather than month first…

pdfnames

As a severe test of what after all is a mere 0.1.0.20 app, it took 9 minutes to whisk through 90 years worth of Field Artillery journal (1911-2007), running from a Google search of site:sill-www.army.mil/FAMAG/ , to find 800Mb in 996 PDF files, and to then start to download them. This was, of course, the point at which I wanted OutWit to have a big red STOP button, although quitting the app did the trick.

Where Google Scholar stands on art history

17 Wednesday Jun 2009

Posted by futurilla in Academic search, JURN's Google watch

≈ Leave a comment

Hannah Noll’s paper for her M.S. in Library Science degree, Where Google Scholar Stands on Art: An Evaluation of Content Coverage in Online Databases (PDF link, 300kb)…

“This [ 2008 ] study evaluates the content coverage of Google Scholar and three commercial databases (Arts & Humanities Citation Index, Bibliography of the History of Art, and Art Full Text/Art Index Retrospective) on the subject of art history. Each database is tested using a bibliography method and evaluated based on Peter Jacso’s scope criteria for online databases. Of the 472 articles tested [ * ] , Google Scholar indexed the smallest number of citations (35%), outshone by the Arts & Humanities Citation Index which covered 73% of the test set. This content evaluation also examines specific aspects of coverage to find that in comparison to the other databases, Google Scholar provides consistent coverage over the time range tested (1975-2008) and considerable access to article abstracts (56%). Google Scholar failed, however, to fully index the most frequently cited art periodical in the test set, Artforum International. Finally, Google Scholar’s total citation count is inflated by a significant percentage (23%) of articles which include duplicate, triplicate or multiple versions of the same record.”

* tested with a set of “article citations authored by a pre-selected set of art historians” via 12 names “culled from the Dictionary of Art Historians“, according to the paper. Authors had to be British or American, and born after 1925.

It’s interesting that Noll rejects keyword searches as a test measure…

“Searching by a compiled list of subject terms did not seem appropriate for testing Google Scholar. Google Scholar lacks a system of controlled vocabulary and search results reflect in many cases a full-text search of the document, whereas traditional databases only search the title and abstract keywords of a record.”

… yet Noll might have easily used intitle:”title of the article” with Google Scholar, to find specific articles. The intitle: search modifier is not mentioned in the paper. Instead Noll used a wider author search, then trawled the results for the target titles, but admits of this method of using Google Scholar that…

“some articles may have been impossible to find by using the author search.”

Ancient Athens

15 Monday Jun 2009

Posted by futurilla in Academic search

≈ Leave a comment

From the blogs. ‘Why I made JURN’, part 199…

“…it’s so difficult to find a lot of academic articles online unless you subscribe to a service like JSTOR. It took me a good couple of hours to unearth a research article the other day using Athens/JSTOR (it was so much hassle that it would almost have been easier to go to the library).”

…and this from a savvy Web 2.0 staff-development trainer, blogging from a major British university. How must some of the less able undergraduates fare?

Two new full-text research assistance services

13 Saturday Jun 2009

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

A couple of new commercial start-ups in medical/scientific full-text research assistance services, offering to outsource some of the heavy-lifting for librarians — Pubget and the rather clunkily-named Mighty Linkout Machine. Amazingly, given the seemingly enormous resources poured into science journals and elite universities, these services are said to be needed because scientists and doctors are…

“frustrated by the challenge of getting full-text PDF access to science journal articles — even while working inside well-endowed institutions like Harvard and Oxford”

Giving free JSTOR access to alumni

13 Saturday Jun 2009

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

Now here’s a nice move. Southern Illinois University is giving free JSTOR access to its alumni…

“SIU alums can access JSTOR anywhere in the country after registering on the Alumni Association Web site.”

If there was one thing that would get me back in touch with my old alma mater, after having lost touch with the alumni magazine during a few house moves, that would be it.

Common Tag and Search BOSS

13 Saturday Jun 2009

Posted by futurilla in Academic search, How to improve academic search

≈ Leave a comment

This looks somewhat interesting. Just launched, Common Tag…

“is an open tagging format developed to make [ Web ] content more connected, discoverable and engaging. Unlike free-text tags, Common Tags are references to unique, well-defined concepts, complete with metadata and their own URLs.”

From what I read, it sounds a bit like herding cats — attempting to persuade (firstly) bloggers and social bookmarkers to use standardised vocabularies and terminology for content tagging. I suspect it’ll find difficulties in gaining traction, simply due to the sheer size of the Web. Nice logo, though…

commont

It would be interesting to see an academic version, which could auto-read a document and suggest and automatically embed (microformat or RDFa?) tags using the A&AT terms.

And I just found out about the Yahoo Search BOSS, which seems to have been around in mature form since late 08. It’s Yahoo’s competitor to Google CSE. It seems to have appeared during their recent takeover troubles, which doesn’t inspire confidence. However, it’s getting new features and appears to be under active development. New sorting functions have apparently been added to BOSS, offering sorting by date and/or a specified time range (although it seems that may be limited to custom News search?). There’s also a Python-driven mashup feature, although at present people seem to be using this to add rather naff-looking context-aware sidebars alongside search-results. There’s also a kicker in the small print…

In the near future, we will be introducing a fee structure for BOSS

If sorting by date was a feature that could be added to Google CSE results, and a keyword-targetted RSS feed was then allowed to run from that sorting, JURN could feed you a usable approximation of a rolling keyword-specific table-of-contents alert from 3,000+ ejournals. Does the current standard open access ejournal publishing software allow that sort of cross-journal alerting service, I wonder?

Open access search?

12 Friday Jun 2009

Posted by futurilla in Academic search, How to improve academic search, My general observations

≈ 1 Comment

Following on from my previous post… a search for “open access” site:www.google.com/coop/ was discouraging. There are about twenty “living-dead” Custom Search Engines from 2006, but no large ones updated after 2006 (so far as I could tell from a quick visit).

Pouring out all this open access content is all very well, but where’s the competition and development in open access search?

And where are the simple common standards for flagging open content for search-engine discovery and sorting, for that matter? Judging by the structure and look of most academic repositories, internet search-engines are the last things on their minds.

Now of course I’m viewing things from the outside, as an independent curator and social entreprenuer, not a librarian or OA evangelist. But it seems to me that burying your PhD thesis deep in a repository cattle-car — seemingly with only a few keywords, an ugly template and an impenetrable URL for company — isn’t serving it or the author very well. Especially in terms of metadata and tagging leading to full-text search discovery. As the authors of “Experiences in Deploying Metadata Analysis Tools for Institutional Repositories” recently wrote in Cataloging & Classification Quarterly (No. 3/4, 2009)…

“Current institutional repository software provides few tools to help metadata librarians understand and analyse their collections.”

Which doesn’t bode well for search-engines aiming to hook into and sort the same metadata. That sort of statement might have been acceptable in 1999, but it’s a damning statement to hear from librarians in 2009. And another paper in the same issue concludes that there is…

“a pressing need for the building of a common data model that is interoperable across digital repositories”.

Now I wouldn’t know a Dublin Core from a Dublin Pint, but how difficult would it have been to build a search-engine friendly tag that allows a repository to tell the world “this is a root free-to-all full-text file” and “you’re not going to get any full-text for this title”? Or to allow the “one-click” filtering out of science and medical-related OA material across search results from a thousand repositories?

This could be done at the URL level. For example by using a standard universal URL structure that could be read by machines and humans alike. For a journal it might run something like:

   www.technology-history.org/journal-issue-004/free-full-text/2009_adams_preindustrial_water_mills.html

Where preindustrial_water_mills are the first three words of the article title.

Without even accessing the document, a human can now glance at the URL in search results and read off:

   Journal name (Technology History)
   Issue number (Number 4)
   It’s from a journal
   It’s free full-text
   The year published (2009)
   The author surname (Adams)
   The first three words of the article title (“preindustrial water mills“)

For a repository it could look something like:

   www.uni.edu/oa-repository/free-full-text/theses/history/history-of-technology/2009_adams_preindustrial_water_mills.html

And with a uniform standard for URL structures, university IT techies would not be allowed to fiddle with the directory structure and thus break the URL. All full-text files in U.S. repositories could then be searched simply by indexing one line:

http://www.*.edu/oa-repository/free-full-text/

Anyway, rant over. I did find a large Google CSE for Economics. Not much use for the arts and humanities you might think, and last updated in 2006, but due to its sheer size (23,613 sites from apparently reputable sources) searches for…

“creative economy” keyword

“creative industries” keyword

“art market” keyword

… all seem to show it still has some use as a discovery tool.

← Older posts
Newer posts →
RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • February 2026
    • January 2026
    • October 2025
    • May 2025
    • April 2025
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.