• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Author Archives: futurilla

Ten problems with Google Custom Search Engines

03 Thursday Dec 2009

Posted by futurilla in How to improve academic search, My general observations

≈ 3 Comments

1) Google often doesn’t seem to index quite everything at a site. Nor does it always index everything on a page or in a PDF file. Or perhaps it does index everything, but the algorithm that shapes each set of search results jettisons a few results for various reasons? The other possibility is that Google’s results are drawn from a pool of ‘shards’ of previous results, rather than direct from the core crawl data.

   Solution: Google “Caffeine” and subsequent revamps?

2) Results from the main Google search can sometimes differ from those in your CSE. Your CSE will occasionally give radically less results from a site than the main Google does. Google doesn’t explain why this is, or the mechanism behind it. Perhaps there are several different versions of the Google index. Results are often much better when using a more sophisticated search method than simple keywords, searching “for phases” for instance. Sometimes you have to give up on trying to get your CSE to “see” the PDFs you want (although these are visible to the main Google) — and instead find a way to index just the linked table-of-contents pages (which will usually show up in your CSE).

   Solution: A lot of extra work. Google could offer a “full Google” CSE to worthy non-profits.

3) Academics love to store the real content at some location that has a different URL than their home-page does. An unoptimised CSE may thus index a website containing ten pages, but not the 10,000 articles that they point to.

   Solution: A lot of extra work, of the sort that JURN has undertaken, to find and then optimise the real “content location URL”.

4) Initial URL gathering can be arduous. Techies and web editorial staff at universities love to juggle directory structures, often for no discernible reason, and thus break links. Link-rot is severe in ejournal lists from more than two years ago, and lists over four years old often have around 80% dead links.

   Solution: Techies need to set up robust redirects if they really have to break URLs. “Self-destruct tags” that delete a links-list page after a certain date, if it hasn’t been updated for more than two years.

5) Google CSEs cannot pick specific content (e.g.: a run of journal issues) from the meaningless database-driven URLs commonly found in academic repositories, since there is no repeating URL structure to grab onto. It’s a question of indexing “all or nothing”.

   Solution: URL re-mapping services that are recognised and can be “unwrapped” by Google? Plain HTML “overlay” TOCs.

6) Editors don’t enforce proper file-names on published documents, which means many CSE search results are titled in the Google results as something like “&63! print_only sh4d7gh.indd” rather than “My Useful Title”. Nor do people add the home location URL and website title to the body of their document — which means that scholars can waste several minutes per article trying to find out where it came from. Some students may never manage to find the journal title for the article they downloaded.

   Solution: Better publication standards at open access and independent ejournals.

7) Large Google CSE are easy to make, but take a lot of hand-crafting to properly optimise and maintain. “Dead” CSEs from late 2006, when the CSE service first appeared, litter the web. Most of these were also un-optimised. Despite the potential of CSEs, it’s really hard to find large subject-specific CSE that are both optimised and maintained. Most people now seem to use CSEs for indexing a single site or a small cluster of sites that they own.

   Solution: Users should remove old circa-2006 CSEs from the web. Subject-specific academic and business groups should consider building a collaborative CSE rather than a wiki.

8) Google’s search result ranking doesn’t work as well as it might in tightly defined academic searches. The PageRank wants to evenly “spread the results” across a variety of sites, and thus you’ll rarely see results from just one site dominating the first ten hits – although that may be exactly what a tight academic search requires.

   Solution: For some types of CSE, this could probably be solved by delving into the optimisation features that Google offers for linked CSEs. Update: Google appears to have tweaked the algorithms to fix this problem.

9) Google searches have a problem with finding text at the end of long article titles, of the kind which are common in academia.

   Solution: Authors and publishers should work to keep article and page titles under 50 characters.

10) You can’t have your CSE do a “search within search results”.

   Solution: Manually build a set of pages containing the result URLs you want indexed, then get Google to see these as static pages which can then be added to your CSE.

Five new titles added

03 Thursday Dec 2009

Posted by futurilla in New titles added to JURN

≈ Leave a comment

Added to the JURN site-index today. JURN is now indexing over 3,500 titles:—

Site/Lines (“a literary forum for essays and reviews of books, exhibitions, and designs dealing with landscape themes and projects” – publication of the Foundation for Landscape Studies)

Pli : the Warwick Journal of Philosophy

Journal of Religious Culture

Ivy Journal of Ethics (applied bioethics, published by the Bioethics Society of Cornell)

Ecclesiology Today (British church buildings and furnishings)

JISC e-books report

02 Wednesday Dec 2009

Posted by futurilla in Official and think-tank reports

≈ Leave a comment

A new report from the UK, JISC national e-books observatory project: Key findings and recommendations (PDF link, 1.2Mb)…

“Behavioural evidence from the Observatory project strongly suggests that [university] course text e-books are currently used for quick fact extraction and brief viewing rather than for continuous reading, which may conflict with the assumptions about their use made by publishers (and authors). They are being used as though they are encyclopedias or dictionaries rather than extended continuous text.”

Eno on classification and the death of uncool

02 Wednesday Dec 2009

Posted by futurilla in My general observations, Spotted in the news

≈ Leave a comment

Brian Eno in Prospect magazine, on the death of uncool…

“There’s a whole generation of people able to access almost anything from almost anywhere, and they don’t have the same localised stylistic sense that my generation grew up with. It’s all alive, all “now,” in an ever-expanding present, be it Hildegard of Bingen or a Bollywood soundtrack. The idea that something is uncool because it’s old or foreign has left the collective consciousness.”

Why is this interesting here on the JURN blog? Because Eno relates this apparent change to increasingly nuanced classifications of cultural products. Which must arise partly from our ability to tag and generally re-clump cultural products into ever finer categories (Amazon Listmania lists, Spotify playlists, etc) online, although one can see ample evidence that this was starting to happen in music before 1995 and the Web. Possibly there’s also some spillover from huge genre blockbusters, since better classification and cultural navigation routes mean that far more people can now migrate out from quality blockbuster experiences to similar but much more obscure product (e.g. from Harry Potter to The Giant Under The Snow).

Eno perhaps misses some subtleties. Category-proliferation is inclusive in the online world (Wikipedia pages which easily explain the finer points of said classification to the un-initiated, and searches that quickly offer up frictionless samples of it, easy-access online communities of interest). This plenitude helps to spread the range of sustained interests people have, which means British politeness has to go into overdrive to keep up, when we meet someone in person and they start talking about their interests — thus possibly contributing to the demise of “uncool”. But the real-world groups forming around / promoting these categories remain exclusionary, since age-related group dynamics and simple shyness kicks in (you won’t see many over-40s at your 8-bit electropop game-music night, or groups of eager adolescents at a classical concert). And perhaps even more exclusionary because the categories are so niche, and so the fragile boundaries need all the more patrolling. “Uncool” still potently exists in the real-world of cultural events, and in musical terms it’s still tightly intertwined with social class and age and personal prettiness.

Hopefully, though, Eno concludes by suggesting that…

“The sharing of art is a precursor to the sharing of other human experiences” … “what is pleasurable in art becomes thinkable in life”

I’m not sure that’s likely, at least not in the British context. The British climate has always been conducive to us drawing the curtains and “living in our imaginations” for six months of the year, often while sampling all sorts of exotic and fantastical influences and stories, but it doesn’t seem to have made the national character any the less reserved.

And I think it might be more useful to consider “old or foreign” as separate issues. Eno is being quietly political, by casually conflating them. Although, in the end, it’s true that they’re part of the same process of cultural assimilation and re-invention.

The British have always seen “the foreign” as potential material to be quietly appropriated and re-worked into the national culture and national identity. Be wary when the British start to pay serious cultural attention to “the foreign” — we usually want to assimilate it and neuter it. The attitude is that we don’t openly talk much about that process, though — hence the social usefulness of “uncool” at the moment of appropriation, while under the surface we’re actually quietly exotic-ising it so as to extract all the cool we can, ready for eventual re-shaping and re-deployment in the “taste wars” that have long served as a useful proxy for all sorts of other polite social conflicts in the British Isles. And then 30 years on, once it’s safely drained, to claim bits of it as our own and to forget its origin.

And popular unashamed interest in “the old” is nothing new. This neo-romantic antiquarian strain can perennially be seen everywhere in British pop culture since the circa 1966/7, from Pink Floyd weaving references to Hereward the Wake into their lyrics, to the Beatles neo-Victorian dress and moustaches on Sgt. Pepper, Peter Gabriel on Salisbury Hill, Jarman’s re-imagining of Shakespeare, Morrissey’s love of graveyards, Vivian Westwood’s clothes, Edward Larrikin warbling “everything that I adore came well before 1984”, to modern antiquarians such as Julian Cope. There are many parallels in art, film, and literature. There’s always been a sense that the past is a mine to be plundered for contemporary cultural production. What has changed recently in the culture is perhaps the sudden breakdown of the Blairite hegemony around Englishness and history, and that is perhaps what Eno is picking up on where he talks of…

“The idea that something is uncool because it’s old … has left the collective consciousness.”

Although this is certainly not the case with our architecture, where the credo among planners is still very much “old = neglect it, so we can demolish it”.

Teaching humanities search

02 Wednesday Dec 2009

Posted by futurilla in Academic search

≈ Leave a comment

Wayne Bivens-Tatum at Princeton, on teaching modern undergraduate humanities search techniques…

“…humanities reference has changed from being question-driven to being project-driven […] From students at all levels, I’m asked not for answers to questions, but for strategies of research. It seems crucial for my work not just to know that X database or Y book might cover a field or have an answer, but to be able to map a research strategy for a specific research question or project. […] might involve searching databases in various fields, thinking about various ways to approach the topic, different avenues of exploration, different ways of conceiving the question depending on what resources we find, etc. This is especially true as the students engage in interdisciplinary work.”

All of which rings true. He offers a list of skills a modern humanities librarian might need at the undergraduate level. I might add to the list…

* the need to fully understand how learning about a new topic and searching for it are now intertwined as part of the same dynamic process.

* the ability to teach re-findability, which partly relates to teaching how to set up a workflow to accurately move references from initial discovery to final paper.

* the ability to help a student evaluate and then buy a paper copy of a book, outside of the usual library channels.

filetype:pdf working in Google Scholar

02 Wednesday Dec 2009

Posted by futurilla in JURN's Google watch

≈ Leave a comment

Oh, this is interesting. filetype:pdf is now working in Google Scholar. It used to be ignored. Using it seems to filter out citation-only records. Results are still cluttered with paywall Springer / Oxford / Sage / Muse etc results — those services will happily send a PDF which will always fail to open on a home connection, presumably due to encryption — but the results are noticeably different and give a better chance of obtaining full-text articles.

Tenurometer

02 Wednesday Dec 2009

Posted by futurilla in Academic search, How to improve academic search, Spotted in the news

≈ Leave a comment

Tenurometer is a Firefox addon that works with Google Scholar…

“to facilitate citation analysis and help evaluate the impact of an author’s publications.”

Sadly the makers of the addon are dangerously wrong, in writing that…

“Google Scholar provides excellent coverage”

Scholar provides only very marginal coverage of several thousand independent and open access titles in the arts and humanities. Another problem might arise from the fact that it also indexes repositories and home-pages, as well as journals. Further problems with using Google Scholar for assessing impact have been discussed elsewhere by others.

One other thing that goes unexplained is how to access Tenurometer once you’ve installed it. It’s an addon that’s counter-intuitively accessed under the “View” menu rather than “Tools”/Add-ons. To turn it on you need to go to…

Then you get…

You need to type “p” to get a drop-down predefined list of subject tags.

At the moment, it’s painfully slow — taking over a minute to process a simple History subject area query for author Klaus Graf. Finally, after six erroneous pages of medical papers Tenurometer offered a correct link to: “Reich und Land in der sudwestdeutschen Historiographie um 1500”. The “filter results by subject area” option still needs some heavy work, it seems.

A short guide to free academic search – updated

01 Tuesday Dec 2009

Posted by futurilla in My general observations

≈ Leave a comment

JURN’s “A short guide to free academic search” has been significantly overhauled and link-checked in the past two days.

Lost World of Old Europe

01 Tuesday Dec 2009

Posted by futurilla in New titles added to JURN

≈ Leave a comment

Added to the JURN site-index:—

Three free PDF chapters, such as “The Figurines of Old Europe” (PDF, 12Mb), from the sumptuous exhibition catalogue for The Lost World of Old Europe: the Danube Valley, 5000 – 3500 BC, a show on now in New York.

   [ Hat-tip: AWOL blog ]


“The Thinker” from Cernavoda, and female figurine. 5000 – 4600 BC.


The Vinca “alphabet”, common symbols (6000 – 5000 BC, south-east Europe).

Three more titles added

01 Tuesday Dec 2009

Posted by futurilla in New titles added to JURN

≈ Leave a comment

Added to the JURN site-index today:—

Working with English

Discursos Fotograficos

Journal of Intellectual Property Rights (Has occasional articles on indigenous cultures and IP, internet and IP, fashion and IP)

← Older posts
Newer posts →
RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • February 2026
    • January 2026
    • October 2025
    • May 2025
    • April 2025
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.