• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Monthly Archives: September 2015

Placing text

29 Tuesday Sep 2015

Posted by futurilla in How to improve academic search

≈ Leave a comment

A fascinating and very clearly written April 2015 article about automatically mining geolocation points out of plain text: “Mapping Words: Lessons Learned From a Decade of Exploring the Geography of Text”…

In Fall 2014 I collaborated with the US Army to create the first large-scale map of the geography of academic literature and the open web, geocoding more than 21 billion words of academic literature spanning the entire contents of JSTOR, DTIC, CORE, CiteSeerX, and the Internet Archive’s 1.6 billion PDFs relating to Africa and the Middle East, as well as a second project creating the first large-scale map of human rights reports. A key focus of this project was the ability to infuse geographic search into academic literature…”

We probably need a name for such activities, and also for mining eco/geo data out of old paintings and photographs of landscapes. Geo-mining is too 20th century and eco-unfriendly. Geo-gleaning and Geo-gleaner are terms that have a certain poetry about them, while also suggesting both the curatorial and the imprecise nature of the techniques.

Google Scholar and grey literature

28 Monday Sep 2015

Posted by futurilla in Academic search, JURN's Google watch, Spotted in the news

≈ Leave a comment

Interesting new paper at PLOS One, “The Role of Google Scholar in Evidence Reviews and Its Applicability to Grey Literature Searching”.

Test searches were drawn from review papers…

“…chosen as they covered a diverse range of topics in environmental management and conservation, and included interdisciplinary elements relevant to public health, social sciences and molecular biology.”

… and compared alongside Web of Science results…

Surprisingly, we found relatively little overlap between Google Scholar and Web of Science (10–67% of WoS results were returned using searches in Google Scholar using title searches).

Unsurprisingly, Google Scholar wasn’t found to be the one-stop shop many assume it to be…

… some important evidence was not identified at all by Google Scholar … [so it] should not be used as a standalone resource in evidence-gathering exercises such as systematic [literature] reviews.”

Interesting finding also that…

“Peak” grey literature content (i.e. the point at which the volume of grey literature per page of search results was at its highest and where the bulk of grey literature is found) occurred [in Google Scholar] on average at page 80 (±15 (SD)) for full text results … page 35 (± 25 (SD)) for title [search] results.”

So this suggests that one might usefully flick through to result 700 (of 1000) and work a few hundred results starting from there, if seeking grey literature with a very well-formed topic search? By well-formed I mean the sort of sophisticated literature-review style of search term chaining being used in this study, for example…

“oil palm” AND tropic* AND (diversity OR richness OR abundance OR similarity OR composition OR community OR deforestation OR “land use change” OR fragmentation OR “habitat loss” OR connectivity OR “functional diversity” OR ecosystem OR displacement)

It appears that the researchers only auto-extracted “citation records” from the search results, and then classified into broad categories based on those alone. There appears to have been no checking as to the validity of the link, and/or downloading and scrutiny of PDFs. So there are no measurements of how many of Google Scholar’s links work or lead to free no-paywall fulltext articles.

Lastly, I noted…

Google Scholar has a low threshold for repetitive activity that triggers an automated block to a user’s IP address (in our experience the export of approximately 180 citations or 180 individual searches). Thankfully this can be readily circumvented with the use of IP-mirroring software such as Hola (https://hola.org/)”

Has it leaked?

25 Friday Sep 2015

Posted by futurilla in My general observations, Spotted in the news

≈ Leave a comment

Has it leaked? is a rather nice specialist search tool for free content, from Sweden. Focussed on forthcoming arty music albums, it basically saves fans the task of tracking down the tracks / snippets / “making of…” etc that the official marketeers ‘leak’ for free in advance of the album, or during the release window. It’s not a pirate site, though, and firmly states: “No download links are allowed!”.

hasitleaked

I’d say there’s room in the market for something similar for all quality non-fiction books, perhaps in partnership with a book-summary service like Blinklist, and with user-configurable topic filters.

Why would such a site be needed? Here’s an instance of the limited way in which current mega-services offer to group versions or offer preview options. If one looks at Amazon UK for the new Matt Ridley book The Evolution of Everything: How New Ideas Emerge one only sees two options there for the audiobook: free with an Audible direct-debit subscription, or a £30 pre-order and wait until November for delivery. Even then the audiobook pages are not linked from the print book page, so someone landing on the print page via Web search would have no clue there even was an audiobook version. No mention at all on Amazon UK that it’s actually available now for £13 on the Audible UK site, or that there’s a free 13 minute extract of the introduction of the audiobook available via publisher on SoundCloud. Only my deep searching surfaced the free audiobook extract.

The above suggests that two mega-services (Amazon and Audible) and a mega-publisher (Harper) can’t even co-ordinate promo material and version offers for a major book in the globally important UK market. So I’d say there’s a lot of scope for savvy curators to do it for them, also adding author podcast links, newspaper book review links etc.

DuckDuckGo testing #2

24 Thursday Sep 2015

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

I did a quick experiment in making a Custom Search Engine via DuckDuckGo‘s link-chaining feature. In this experiment I enable a search across a small group of reputable crowdfunding services, via this search in DuckDuckGo. The search format is…

"open access" site:patreon.com,gofundme.com,peerbackers.com,mysherpas.com,wedidthis.org.uk,crowdcube.com,cofundos.org,indiegogo.com,rockethub.com,kickstarter.com

Works fine. WordPress.com refuses to embed an active link that contains “a phrase” (it’s the inverted commas, presumably), but this test link should work.

Unfortunately chaining a list of URLs appears to turn off DuckDuckGo’s intitle: search modifier, at least when searching for a phrase. But intitle: does work when using a single keyword, in a search such as…

intitle:journal "open access" site:patreon.com,gofundme.com,peerbackers.com,mysherpas.com,wedidthis.org.uk,crowdcube.com,cofundos.org,indiegogo.com,rockethub.com,kickstarter.com

A keyword / phrase that veers more into popular culture (such as Lovecraft) seems to cause Kickstarter results to swamp the search results.

I also noted that the search results from the above example fail to distinguish between “open access” and “open-access”. Adding +, as in +”open access”, fails to force a verbatim search. There is obviously some slight wiggle-room in DuckDuckGo’s claim that they don’t try to second-guess your search terms. Google has the same problem with a verbatim that is-not-really-verbatim.

There’s no sort-by-date filter on the search results, and adding the search modifier sort:date to the search causes a chained-URLs search to totally fail.

Sadly a list of chained URLs just doesn’t work with DuckDuckGo’s Image Search. For instance, a searcher can’t constrain Image Search thus…

"cute cat" site:flickr.com,deviantart.com,commons.wikimedia.org

When looking for Creative Commons images using DuckDuckGo Image Search a better strategy is probably simply to dispense with the URL chain and use this…

"cute cat" "some rights reserved" OR "cute cat" commons attribution -noncommercial

This will still pick up “noncommercial” CC pictures on Flickr (since Flickr obfuscates the picture’s license behind a “some rights reserved” generality), but at least you’d be headed in the right direction. Note that it seems that DuckDuckGo only lets you use a single minus sign to knock out one keyword from the search, and it has to be at the end of the search to work.

A “Region” filter doesn’t appear to work on Image Search. You can’t just see the “cute cats” of Japan, for instance.

cats

DuckDuckGo testing #1

23 Wednesday Sep 2015

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

First finding from my DuckDuckGo search testing. That site: is not at all a reliable indicator of what is indexed, when using an extended URL. For instance, the PDFs of the Joint Nature Conservation Committee, UK…

site:http://jncc.defra.gov.uk/pdf/

One lone result in DuckDuckGo. However, search for…

“The Vascular Plant Red Data List for Great Britain”

And up it pops at…

http://jncc.defra.gov.uk/pdf/pub05_speciesstatusvpredlist3_web.pdf

So the PDFs at http://jncc.defra.gov.uk/pdf/ are in there then, but it seems they can only be surfaced in DuckDuckGo by using…

site:jncc.defra.gov.uk filetype:pdf

AdBlock Browser launches

23 Wednesday Sep 2015

Posted by futurilla in Spotted in the news

≈ Leave a comment

The Adblock Browser has launched for mobile devices (Android and iOS). DuckDuckGo is their default search-engine.

How to switch to the DuckDuckGo search engine in Firefox

23 Wednesday Sep 2015

Posted by futurilla in JURN tips and tricks

≈ 2 Comments

Here are my ten steps to switch to the DuckDuckGo search engine in Firefox, and have it work reasonably well across a PC desktop widescreen. In my view DuckDuckGo’s indexing and relevancy ranking is now ready for a serious test by power searchers. The image search relevancy, for instance, is arguably better than Google Images can offer.

duckduck


1. In Firefox, switch your ‘home’ search-engine to https://duckduckgo.com/ in Tools | Options | General | Home Page.

There appear to be no useful URL modifiers you can append to this URL, for instance &show=15 to only show 15 results.


2. Look inside the little “three bars” icon in the top-right of your new DuckDuckGo search-engine page. Here you’ll find a variety of visual and feature settings that you can change. Changes can be either saved locally or anonymously to the Cloud. I assume you run AdBlock or similar and won’t need to turn off adverts, though DuckDuckGo kindly lets you turn ads off if you want to.


3. Note that DuckDuckGo’s Settings pages have several tabs, one of which is Appearance. This lets you change fonts and font size, link colours and more. If you just want search results to look a little more Google-y in that respect there’s also a very handy “DuckDuckGo Modifier” UserScript for Greasemonkey that will handle that for you. I like this simple addon a lot, and it will ease the transition greatly for many Google users.


4. You can also pick a base colour theme for the DuckDuckGo home and results pages, and then modify this theme by changing Background Colour using a Hex value from a simple colour-picker widget.


5. Power searchers who have learned to instantly sight-read an URL will want to turn on “Result Full URLs” and “Show the Result URL line above the snippet text”. Unfortunately longer URL paths are truncated.

urlgreen

You may also want to turn off the distracting tiny “Site Icons”, and allow URLs to be copied to the clipboard “as is” rather than in an obfuscated form.


6. Now install a multiple column layout for your search results. “DuckDuckGo – Multi-Columns” is a maintained GreaseMonkey UserScript and also a theme for the Stylish add-on, that will do this for you. Basically it does for DuckDuckGo search-engine results what GoogleMonkeyR does for Google Search results in Firefox.

duckduckPicture: Multi-column results, unwanted results hidden, customised colors.

Sadly you can’t limit DuckDuckGo to showing only 15 results, as you can with Google, which means some scrolling with this setup. Though you can turn off Autoscroll in DuckDuckGo’s settings, which helps a bit. But it’s still not ideal, for a widescreen desktop user who wants as little scrolling as possible.

Installing this “DuckDuckGo – Multi-Columns” add-on as a Greasemonkey UserScript is probably best, since then you can then easily edit certain features in the script. For instance, you can turn off the distracting “DuckDuckGo – Multi-Columns” results-numbering. To do this, find the scripts’ code-block that starts…

   "/* (new3) RESULTS - COUNTER */"

… and change the colour and background colour codes to match your theme background, thus effectively making the numbers invisible…

   " color: orange ! important;",
   " background: yellow ! important;",

You will probably also want to change the awful bright red colour that the script uses to highlight your search keywords (when they appear in search result text snippets). To do that find the scripts’ code-block that starts…

   "/* (new2) - RESULTS HIGHLIGHTING - ",

… and then change the colour name “tomato” to something less garish…

   " color: tomato !important;",


7. Note that the excellent “Google Hit Hider” UserScript add-on for Firefox also works by default with DuckDuckGo. It seamlessly blanks results from URLs you’ve added to your personal list of unwanted domains.

You will probably want to choose to have this add-on’s “Block” button set to appear only when a search result is in a mouseover state, as it’s one less visual distraction for the speedy searcher.


8. Useful search modifiers that work for Google also work for DuckDuckGo…

   -keyword    these must be the last words in the search, to work.

   “specific phrase“

   “the ethics of *“    will wildcard a word in a phrase.

   filetype:pdf or simply f:    will find only PDF files.

   intitle:keyword or simply t:    only results with this keyword in the link.

   sort:date or simply s:d    Gives “sort by date”. This gives a simple re-sort of results to show only the most recently indexed results. My guess is that using s:d only brings results from a ‘recently indexed’ sub-set, and that any ‘recently indexed’ tag is jettisoned by DuckDuckGo once spam and adult content is cleaned and the clean results are passed over into the main index.

   site:imdb.com    Note that you can leave out the www. bit (unlike Google, which expects it).

   -site:wikipedia.org    No Wikipedia results!

   site:imdb.com,rottentomatoes.com    Search multiple domains in one go. It would be more useful if a user could somehow pin their custom chain of such URLs to the search box. But I guess you can set it up as a Bookmark on your Bookmarks Toolbar in Firefox.

   -site:imdb.com,rottentomatoes.com,wikipedia.org    Show no results from any of these sites.

   region:uk    Limit results to a national domain. This is also embodied in a very handy pop-out visual side-widget with nation flags, titled “Region”.

Most of these search modifiers can be combined, but it seems that sort:date only works if it is the final item in the search box.


9. DuckDuckGo’s excellent Image Search has no Creative Commons filter. But CC can be approximated by adding keywords e.g.: Commons Attribution -Noncommercial. This actually seems to work very well, though you will of course need to check for the license and not take things for granted.


10. DuckDuckGo’s !bang feature sound like a pointless gimmick at first, but you soon start to realise the power of the !bang. !a will pass your search directly over onto Amazon, and !yt to YouTube and without going through those sites’ respective start pages. !gsc does the same for Google Scholar and !gb for Google Books. There are many more.

The UserScript add-on DuckDuckMenu provides a more familiar “top menu of links” way of using the !bang system. It’s also customisable.

menu

Below are the correctly formatted links for setting up some academic services on this menu. The inserted {searchTerms} section of the URL copies in the already searched search terms that are currently sitting in your DuckDuckGo search box.

Google Search (turn off the stupid AutoSuggest, force Verbatim, and return 15 results for use with three column results layouts such as GoogleMonkeyR):
https://www.google.com/search?q={searchTerms}&tbo=1&num=15&complete=0&tbs=li:1

Google Books:
https://www.google.com/search?q={searchTerms}&tbm=bks

JURN:
https://jurn.link/#gsc.tab=0&gsc.q={searchTerms}&gsc.sort=

Google Scholar UK:
https://scholar.google.co.uk/scholar?hl=en&q={searchTerms}

Amazon Books UK (books only):
https://www.amazon.co.uk/s/ref=sr_nr_i_0?fst=as%3Aoff&rh=k%3A{searchTerms}%2Ci%3Astripbooks

You can also add a handy no-typing way to instantly re-sort your search by date on DuckDuckGo itself:

Re-sort my DuckDuckGo search by sort:date:
https://duckduckgo.com/?q={searchTerms}+sort%3Adate

Add a filetype:pdf link to the menu:
https://duckduckgo.com/?q={searchTerms}+filetype%3Apdf

And an approximate Creative Commons image search can be had by using the link:
https://duckduckgo.com/?q={searchTerms}+commons+attribution+-noncommercial&iax=1&ia=images

Keep in mind that the latter link won’t pick up Flickr’s CC pictures, since Flickr obfuscates the CC licence behind a blanket phrase. For Flickr search it’s probably best to use search.creativecommons.org.


DuckDuckGo appears to have no Current News search function worth talking about (it’s sort of in there, but is very flaky about when it chooses to appear and is obviously not ready for prime-time). But !gn will pass your search through to Google News and !bnews to Bing News.

It seems you can’t yet create something like a BIG!bang that would search across a large collection of 100s or 1000s of specific URLs (like a Google CSE does) and/or RSS feeds, and thus approximate your own custom News search ability.

How to delete a fulltext PDF from ResearchGate

22 Tuesday Sep 2015

Posted by futurilla in JURN tips and tricks, Spotted in the news

≈ Leave a comment

This may possibly be handy for some people. How to remove your fulltext PDF from ResearchGate, but leave the record standing. Finding the way to the delete function doesn’t seem very intuitive…

[youtube https://www.youtube.com/watch?v=7TqBusqz1nY?rel=0&w=420&h=315]

Historical ecology and art history

20 Sunday Sep 2015

Posted by futurilla in Spotted in the news

≈ Leave a comment

A fine short blog post by Manu Saunders on the historical ecology data latent in art history.

bush-fire-between-mount-elephant-and-timboon-1857Picture: 1857 bushfire near Timboon, Victoria, Australia.

Tree of Life

20 Sunday Sep 2015

Posted by futurilla in Spotted in the news

≈ Leave a comment

Tree of Life, a rough first-try at merging the available data on the relationships of the 2.3 million known and named species on Earth…

“According to a survey of more than 7,500 phylogenetic research papers published between 2000 and 2012, only one out of six studies came with a digital, downloadable format of the data. … Many of the evolutionary trees that have been published are only available as PDFs and other image files that can’t be entered into a database or merged with other trees.”

← Older posts
RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • April 2025
    • December 2024
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.