GeoDeepDive is software that helps…
geo-scientists extract data that is buried in the text, tables, and figures of journal articles and web sites […] As of today, GeoDeepDive has processed over 36K research papers and 134K web pages
04 Wednesday Jun 2014
GeoDeepDive is software that helps…
geo-scientists extract data that is buried in the text, tables, and figures of journal articles and web sites […] As of today, GeoDeepDive has processed over 36K research papers and 134K web pages
03 Tuesday Jun 2014
David Prosser at Jisc blogs on the need for action on discoverability…
… 40% of researchers kicked off their project with a trawl through the Internet for material, while only 2% preferred to make a visit to a physical library space. [yet] nearly half of all items within digitised collections are not discoverable via major search engines by their name or title [and, even worse] digitised collections become harder and harder to find over time, for a variety of complex reasons.
18 Sunday May 2014
Oaddo is an early alpha of a cool new search tool. Imagine that Wikipedia and Pinterest combined to give autocomplete a usability makeover, with Trello acting as the makeup girl. The aim is to help you do deep ‘research search’ when you don’t really know what you’re searching for.
It has an interesting way of allowing your search terms to interact with clustered semantic tags, for drilling down to the best search result. Sort of like a Google autocomplete / autosuggest that’s slowed way down and is largely under your control, and is curated by humans — and as a consequence is not dumb.
Oaddo has a nice clean interface too, which is neatly poised between power and simplicity. The developer Tim Borny has obviously been looking at Trello and Pinterest for inspiration. Although at the moment the discarding of search modifier tags takes two clicks, instead of a fun one-click “fling it to the discard tray” movement.
The other innovation is that it aims to have a democratic user-driven model. That aspect might take Oaddo a long way, provided there’s a critical mass of people — and provided a mechanism can be found to reign in the inevitable SEO spivs, ideological censors, and WikiPolice types.
* Users will ‘vote’ on content, curate content and the database of related terms.
* The community will drive the addition of new features.
So, very interesting. Amid the sea of recent search launches, this is actually one to watch. Here’s Tim Borny’s full explanation…
[youtube https://www.youtube.com/watch?v=vCGeIV9NctA?rel=0&w=420&h=315]
04 Sunday May 2014
New long interview with Kathleen Shearer, Executive Director of COAR, on repositories. With a strong focus on discoverability as seen from a broad strategic perspective. From the intro and questions…
“locating and accessing content in OA repositories remains a hit and miss affair, and while many researchers now turn to Google and Google Scholar when looking for research papers, Google Scholar has not been as receptive to indexing repository collections as OA advocates had hoped. … 15 years after the Santa Fe meeting they [researchers] still find it extremely difficult, if not impossible, to search effectively in and across OA repositories”
From the interview…
… “mega-journals” are essentially repositories with overlay services. We should be participating in projects that demonstrate the added value of repositories and repository networks across the research life cycle.” (Kathleen Shearer)
23 Wednesday Apr 2014
The Bing search engine is now offering predictions…
“… teams within Bing have been experimenting with useful ways that we can harness the power of Bing to model outcomes of events. … Today we are bringing these insights directly to our search results pages. Based on a variety of different signals including search queries and social input from Facebook and Twitter, we are unveiling an experiment we’ve built to give you our prediction of the outcome of a given event.”
The front cover of the latest Smithsonian magazine also heralds the Future Studies meme…
27 Thursday Mar 2014
I had a quick look at the full list of Schema.org tags, which are now available in Google CSEs. They can be used to filter the CSE’s site list, serving to “Restrict pages from the above site list to only those that contain [chosen] Schema.org types”. Handy if you have a huge single site of HTML/CSS/XML that you can grep, and you want to prepare it for selective CSE search without having to juggle directories and file names.
It looks to me like those tagging open access scholarly articles would need to be able to chain Schema.org tags into something like…
CreativeWork: ScholarlyArticle: TransferAction: DownloadAction: GiveAction:
Whereas paywall publishers might need something like:
CreativeWork: ScholarlyArticle: TransferAction: DownloadAction: SellAction:
But at present there seems to be only the basic undifferentiated…
CreativeWork: ScholarlyArticle:
Even if there were workable OA additions to Schema.org, there would still the huge problems of: i) persuading people to add the tags to all their ongoing content at the article level, and to do so correctly and consistently; and ii) to have them go back and accurately tag perhaps two decades or more of existing open access articles.
21 Friday Mar 2014
I found a 2013 article from geoscientists who had tested Google Scholar: “Literature searches with Google Scholar: Knowing what you are and are not getting”. Although the body of the paper states that their test phrase was “wildfire-related debris flows”, the data shows they actually tested Scholar with the keywords wildfire-related debris flows. They reportedly found that…
“free articles were available in PDF format for 88% of citations returned by Google Scholar. They were available from open-access journals or via links to organizational sites where authors had posted their publications.”
However if you actually look at their linked search-results data file, then the above statement needs additional clarification. Since it’s clear that paywall articles from Elsevier, Springer and the like, appearing in their Scholar results, were being counted toward those “free articles”. It turns out that many of these were “free” only via a DigiTop proxy overlay for Scholar that is, in the words of DigiTop, “available to USDA employees only”. Nice if you work under the U.S. Department of Agriculture umbrella, but it seems that those outside have to pay.
Does Google Scholar perhaps need to add some kind of “paywall box detector” to its scraper bots? Then perhaps something like [PDF] [-||-] could be added on the right-hand column of the Scholar results, to indicate a PDF that’s “available maybe” — but which will prove to have a paywall that needs to be either backed out from or negotiated? And perhaps [PDF] [-~-] could indicate a genuine direct link to a bona fide PDF file?
Anyway… this is what geoscientists are talking about when they refer to wildfire-related debris flows. Seems like it might be a geological process that intelligent farmers, hiker-campers, and treeline homesteaders around the world would like to learn some precise details about…
Giant mudslides, basically.
Incidentally, the same wildfire-related debris flows search in JURN needs to be tightened up just a little for strong results. Using wildfire-related “debris flows” works better, though the first six pages of good results do stray just a little (to pick up what seem to be three articles about prehistoric ‘dinosaur-era’ debris flow events). Yet even on this test JURN appears to be doing about twice as well as Google Scholar in terms of getting open articles, once Scholar’s ‘false-positive’ paywall PDFs from Elsevier & co. are subtracted from Scholar’s results.
10 Monday Mar 2014
Posted in How to improve academic search
Ten years ago, today…
JISC ITT commission: A study to forecast a delivery, management & access model for eprints & open access journals within Further and Higher Education. … Access should be streamlined and free at the point of use, irrespective of the source of content.
Submission deadline:
10th March 2004 12:00
Funding:
£30,000
03 Monday Mar 2014
Joseph Esposito has usefully had a peek inside a very expensive commercial market report titled Global Social Science & Humanities Publishing 2013-2014.
Social/Humanities publishing is found to be perhaps 25% of the size of Science/Technology/Medicine, at around $5bn. That actually strikes me as something of an achievement, when you consider that we have far smaller research funding inputs and a smaller technical/training infrastructure to call on. But perhaps the $5bn figure is given a strong boost by teacher training textbooks, social work manuals and the like?
Joseph highlights the report’s finding of a highly fragmented market. This market fragmentation is one of the reasons I’m skeptical about the success of a ‘one metadata to rule them all’ solution to OA indexing and discovery. It seems that DOAJ-listed OA journal titles can’t even find their way in full-text into the largest of commercial databases (such as EBSCO Complete) at higher levels than just over 20%. When last heard of the Web of Science / Scopus seemed to be barely scraping 1,000 OA titles indexed. One art history study found that Google Scholar could index only half the DOAJ’s OA art history titles. A dastardly conspiracy to keep OA titles out of these big indexes seems unlikely. So I suspect it’s largely due to many OA editors in the arts and humanities not giving a fig about providing the means to automatically index their content. Their widespread lack of something as basic as RSS feeds seems to confirm that. Add to that the fact that only 56% of DOAJ journals can supply the DOAJ with article metadata. Persuading non-librarian types to do something as simple tag all their back-issue content with some simple new machine-readable OA tag thus seems rather a long shot. Persuading mainstream publishers to do the same? Well… maybe, but what’s their incentive for that? Even if they do, will they allow mass harvesting of the OA articles? Nor are librarians likely to be of much use, after the fact of publication — since they seem to have mostly failed to apply even their own metadata standards to open content, and open repository metadata quality is reported to be dire.
27 Thursday Feb 2014
Wouter has hacked out a Google Scholar API workflow today, sort of. I suspect the reason Scholar has never offered an API is the agreements Google has with the large commercial journal publishers and citation database providers.