• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Category Archives: JURN's Google watch

Moving from Google Noise Reduction to Google Hit Hider

22 Tuesday Mar 2011

Posted by futurilla in JURN tips and tricks, JURN's Google watch

≈ 1 Comment

Firefox 4 final is out now. Sadly it breaks the Greasemonkey script Google Noise Reduction, which was an excellent per-domain results blocker for Google Search.

However, the new and powerful Google Hit Hider does work very well, and is very similar. It’s obviously learned a lot from earlier software like Blocksite, Surfclarity, and Noise Reduction (all of which no longer work with FF4 / the latest Google) and there are some nice refinements. Not the least of which is very easy import/export as simple plain-text lists of URLs.

It’s a fairly simple process to get your hand-crafted Noise Reduction blocklist out of Firefox and into Google Hit Hider…

1. In Firefox’s address bar, type: about:config

2. Scroll down to greasemonkey.scriptvals.http://exego.net//Google Noise Reduction.blacklist You’ll see…

({‘britannia.com’:true, ‘oxfordjournals.org’:true, ‘tandf.co.uk’:true, ‘ingentaconnect.com’:true, ‘sagepub.com’:true, ‘myspace.com’:true, ‘experts-exchange.com’:true})

3. Double click on the line of banned URLs you’ll find there, and copy them to Notepad.

4. Now just top-and-tail the list, then search and replace until you have a clean list, but leave each URL separated by a single comma. Save the list as a .csv (comma separated value) file, then open that with MS Office’s Excel (or whatever the free Open Office equivalent is). The list should load up with one URL per cell.

5. Now just copy and paste the resulting cleaned list into: Manage Hiding / List Util / ‘Perma-ban list’ in Google Hit Hider.

The advantage of this over the now-native Google blocking is that: i) it lets you break the 500 URL limit; ii) you can block domains en-masse rather than one at a time; and iii) it lets you easily import/export the blocklist, in order to share with colleagues etc.

Google launches Google Art

01 Tuesday Feb 2011

Posted by futurilla in JURN's Google watch, Spotted in the news

≈ Leave a comment

A new service from Google, Google Art Project…

“Explore museums from around the world, discover and view hundreds of artworks at incredible zoom levels, and even create and share your own collection of masterpieces.”

Based on the Google Maps technology and its familiar interface, the images are gigapixel and presented without watermarks. Just 17 gigapixel images to start with, and there are also StreetView-like tours of their museums. If images look a little blurry as you zoom in, then simply give time for the tiles to load (in a similar way to Google Earth), and the sharper tiles should appear.

The “Add” button for the creation of personal collections doesn’t seem to work in Firefox.

New Google search modifier

20 Thursday Jan 2011

Posted by futurilla in JURN's Google watch

≈ Leave a comment

Interesting new Google search modifier…

inblogtitle:keyword

The rising tide of Web spam

06 Thursday Jan 2011

Posted by futurilla in JURN's Google watch, My general observations

≈ Leave a comment

A big dollop of lazy journo-bluster has landed at The Guardian, over the amount of outright spam that’s been inveigling itself into the Google search-results.

This growing so-called backlash is largely down to some users thinking they can still type in dishwasher review and get good results. Those “two keywords is enough” days are over — just spend 50 minutes learning how to search properly, guys. Yet some people are going to find learning this more difficult than others — more and more people who not fully literate are now trying to use the web. They can’t skim-read the results very well, or remember how to do complex strings of search modifiers. The ‘advanced search’ forms scare them. All the more reason why we need to be teaching search literacy from infant school onward.

Perhaps the Googleplexers who do nothing else but weed for spam are being temporarily overwhelmed? There’s an obvious tidal wave of robot-registered domains being populated by robots with robot-made pages. 99% of this Web spam has never seen a human hand, other than in the plagiarised material that gets pirated, semi-garbled, and pasted into the page. So, hire as many people as it takes to rip out the spam. It’s not as though Google doesn’t have the cash to throw another 500 eyeballs at the problem.

The other problem that people seem to be raising in the Guardian comments is that we don’t really have a reliable hand-made search-engine for product reviews, one that is devoted to serving only reliable reviews from reliable sources — and nothing else. Certainly, I’ve never found one I like and feel I can trust, and which is comprehensive in its sources and relevant to the UK.

Competition for the Google CSE

04 Tuesday Jan 2011

Posted by futurilla in How to improve academic search, JURN's Google watch, Spotted in the news

≈ 1 Comment

IndexTank, custom search in a box. Nice idea. But it seems to be aimed at individual business looking to reduce their IT overheads, and is useless as a replacement for a Web-wide Google CSE…

“IndexTank doesn’t actively fetch data from you as a web crawler would do. Instead, your application sends IndexTank the data as soon as it is created or updated”

“not a standalone web search engine, and we don’t currently have a way for you to set it up directly through the Web. It requires downloading software such as a WordPress plugin (if you wanted to add better search to your blog, for example) or writing a program to interact with our servers.”

Worse, it can’t even auto-extract indexable text from the PDFs you send it…

“IndexTank, like other full-text search alternatives, indexes only text. However, for common formats like PDF or Word, it is very easy to parse them to obtain the readable text by using open source tools.”

I should mention some of the other ‘sort-of’ search-in-a-box options.

* The old and vulnerable (in the light of the Delicious closure) Yahoo BOSS

* Spinn3r. But it can only supply “A-list” blog content (so possibly not much use for hyperlocal indexing of a city-region), and you have to build your own widget to hook into its API.

* 80 Legs is a pricey monthly-subscription web-crawler. I’m uncertain if their stated ‘URL limit’ refers to the number of URLs on the originating site-list, or the number of files actually found by their crawler. If it’s the latter, you could run out of space very fast.

* And of course the new Blekko, which lets you upload a text file full of your selected URLs, and then uses them to create a ‘slashtag’ that delimits people’s searches. The last one is interesting, and I might eventually have a play around with it. Although possibly that’ll be when you’re no longer limited to 1,000 URLs, and are allowed to use wildcards in the URL list.

It’s great to see some competition emerging to Google CSEs, and perhaps it will eventually spur Google into offering a commercial ‘Deep’ Web-wide version of the Custom Search Engine:— full-text deep indexing of all the documents found at any website it’s pointed at; all the documents found are drawn on to produce your custom search results, every time; and the user gets 12,000 URLs to play with. Or perhaps Microsoft Bing will offer such a service. It might be limited to non-profits, so as to keep the SEO spivs out.

Spamming Google Scholar

22 Wednesday Dec 2010

Posted by futurilla in JURN's Google watch, Ooops!

≈ Leave a comment

Spamming Google Scholar. Very possible, or so it seems…

“…we conducted several tests on Google Scholar. The results show that academic search engine spam is indeed – and with little effort – possible: We increased rankings of academic articles on Google Scholar by manipulating their citation counts; Google Scholar indexed invisible text we added to some articles, making papers appear for keyword searches the articles were not relevant for; Google Scholar indexed some nonsensical articles we randomly created with the paper generator SciGen; and Google Scholar linked to manipulated versions of research papers that contained a Viagra advertisement.”

Beel, J. (2010)
Academic Search Engine Spam and Google Scholar’s Resilience Against it.
Journal of Electronic Publishing 13 (3), December 2010.

AROUND Google

15 Wednesday Dec 2010

Posted by futurilla in JURN's Google watch

≈ Leave a comment

A new Google search modifier… AROUND.

apples AROUND(3) pears

…gives results that contain the word “apples” within three words of “pears”.

[ Hat-tip: Researchbuzz ]

Google’s new ‘Advanced Reading Level’

10 Friday Dec 2010

Posted by futurilla in JURN's Google watch

≈ Leave a comment

Google has implemented a new filter that allows the filtering of search results by ‘reading level’. It’s accessed via the Advanced Search page, thus…

In a search for the term “reading level”, with the Reading Level set to Advanced, I still had a basic About.com page in the first page of results, as well as this blatant SEO spam page as result No.8.

A search for ‘tolkien + symbols’ showed better results, with a solid and useful first two pages of results. Although not that much different from the standard search, except that using Advanced Reading Level blocked a result from the scumbag SEO spam domain directhit.com on the second page of plain results.

How to import/export a list of banned URLs from the Google Noise Reduction script, for Firefox + GreaseMonkey.

20 Saturday Nov 2010

Posted by futurilla in How to improve academic search, JURN tips and tricks, JURN's Google watch

≈ 2 Comments

You may have spent some time building up a list of banned URLs for the Firefox addon Surfclarity, which strips unwanted domains from Google Search Results. Surfclarity no longer works with the latest Google changes, but the Greasemonkey script Google Noise Reduction does. In this tutorial we’ll swop the Surfclarity blacklist into the Google Noise Reduction blacklist.

1. In Firefox’s address bar, type: about:config.

2. Scroll down to extensions.surfclarity.patterns

Double click on the line of banned URLs you’ll find there, and copy them to Notepad.

3. Scroll further down to greasemonkey.scriptvals.http://exego.net//Google Noise Reduction.blacklist and take a look at the format. Note that it’s a little different than Surfclarity…

({‘britannia.com’:true, ‘oxfordjournals.org’:true, ‘tandf.co.uk’:true, ‘ingentaconnect.com’:true, ‘sagepub.com’:true, ‘myspace.com’:true, ‘experts-exchange.com’:true})

So we’re going to have to do some basic search-and-replace on our Surfclarity blacklist. Back up the Google Noise Reduction.blacklist if you want, as we’re going to overwrite it in a few moments.

4. Go back to Notepad and look at the list of Surfclarity URLs you just copied out.

Search for : and replace with : ‘ — note the space after the “:”.

Then search for : and replace it with ‘:true,

Now add ({‘ to the very start of this list, and ‘:true}) to the very end of this list.

Congratulations, you now have your SurfClarity list in Google Noise Reduction format.

5. Copy your new list to the clipboard, go back to greasemonkey.scriptvals.http://exego.net//Google Noise Reduction.blacklist, clear what’s in there at the moment, and then paste the new list in. You’re done.

Obviously, you can now also copy a backup of the Google Noise Reduction.blacklist

How to get the old Google Images back

13 Saturday Nov 2010

Posted by futurilla in JURN's Google watch

≈ Leave a comment

Remove the new Google Image Search’s increasingly annoying ‘Bing-bling’, by using Firefox + GreaseMonkey + a potent combination of Google Image Basic and Direct Images in Google Image Search!. Image search then reverts to how it used to be. Clicking on a thumbnail in the search-results takes you straight to the largest version. When searching for images “larger than…” you may need to tell Firefox (one-time only) what application to open the image with, rather than popping up a “where would you like to download this to…” I told it to open large images with Firefox itself, and large images then open in a new Firefox tab. Nice.

And, while you’re at it… Flickr: link all sizes.

← Older posts
Newer posts →
RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • February 2026
    • January 2026
    • October 2025
    • May 2025
    • April 2025
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.