URL fragments

Someone enquired if I was aware that Google Custom Search “Sites” control panel needs to have a distinctive fragment of an URL entered if it’s to seek accurately for a match. This is to do with checking if an URL is already in the site index. Google Custom Search will happily add a duplicate if you let it, especially if you just check against a full URL. Yes, I’m fully aware of this feature, and that it’s case-sensitive — and so only search for a distinctive fragment of an URL, before adding it to the site index.

I am also cutting back the URL slightly if needed, to give Google a slightly wider “spread” in what it picks up from that URL. Ideally, I’m seeking out just the URL that holds the articles (which is often very different from the main journal URL).

JURN and diacritics/Chinese/Japanese

I’m trying to fix a problem with JURN returning code1 into the search box after a search, when it should show character.

code1 being the “raw” HTML codes for those Chinese characters.

Tweaking the supplied code snippet from UTF-8 to iso-8859-1 seems to cure it. But then that results in nothing being returned to the search box at all, even for English queries. Which is obviously a non-starter, since I’m not going to cripple JURN in English.

It seems the bug results from a combination of Google’s remote “show_afs_search.js” javascript file (which I can’t change), and my showing the results on the same page as the search box (i.e.: the “iframe hosting option”). The language encoding for the search terms is getting stripped out, somewhere in the loop back to the search-box.

Other people’s Custom Search Engines seem to handle the problem, but only by displaying the results on a new second page. I may have to look into having a second interface for non-English users, showing the results on a second page, when JURN makes the move to its own domain. Or you can just use the “raw” Google page for JURN.

Unless someone can offer a solution? But I’ve searched the support forums with no result. It seems it may well be a genuine bug with the “iframe hosting option”. The same bug also causes JURN to refuse non-English accents (i.e.: diacritics) on search terms. So “pate” will work and will find “pate” and “pâté”, — but “pâté” on its own won’t be accepted as a valid search term.

Why name it JURN?

I wanted something memorable, with just a few letters, and that was available as a domain-name on the .org top-level domain.

JOUR sounded too French.

JURN is a common German / Scandanavian boys’ name, no-one else was using it for anything remotely like a search-engine or even a trademark, and I had some new artwork to hand featuring a boy to be the “brand mascot”.

I pictured “Jurn” as some student stuck in the wilds of somewhere like Finland, without paid access to many commercial ejournals. He’d be trying to plough through Google Scholar in English, and getting tangled up in results that constantly demanded payment. JURN is the search engine for that student, and for millions like him around the world who have limited or no access to full-text journal databases.

So… that’s why the new search-engine was named JURN. But as an acronym, what might it stand for? Well, you can pick your own meaning, in the style of the old sci-fi zines — Journal Usury Recovery Net? Jolly Urbane Reading Node?