IndexTank, custom search in a box. Nice idea. But it seems to be aimed at individual business looking to reduce their IT overheads, and is useless as a replacement for a Web-wide Google CSE…

“IndexTank doesn’t actively fetch data from you as a web crawler would do. Instead, your application sends IndexTank the data as soon as it is created or updated”

“not a standalone web search engine, and we don’t currently have a way for you to set it up directly through the Web. It requires downloading software such as a WordPress plugin (if you wanted to add better search to your blog, for example) or writing a program to interact with our servers.”

Worse, it can’t even auto-extract indexable text from the PDFs you send it…

“IndexTank, like other full-text search alternatives, indexes only text. However, for common formats like PDF or Word, it is very easy to parse them to obtain the readable text by using open source tools.”

I should mention some of the other ‘sort-of’ search-in-a-box options.

* The old and vulnerable (in the light of the Delicious closure) Yahoo BOSS

* Spinn3r. But it can only supply “A-list” blog content (so possibly not much use for hyperlocal indexing of a city-region), and you have to build your own widget to hook into its API.

* 80 Legs is a pricey monthly-subscription web-crawler. I’m uncertain if their stated ‘URL limit’ refers to the number of URLs on the originating site-list, or the number of files actually found by their crawler. If it’s the latter, you could run out of space very fast.

* And of course the new Blekko, which lets you upload a text file full of your selected URLs, and then uses them to create a ‘slashtag’ that delimits people’s searches. The last one is interesting, and I might eventually have a play around with it. Although possibly that’ll be when you’re no longer limited to 1,000 URLs, and are allowed to use wildcards in the URL list.

It’s great to see some competition emerging to Google CSEs, and perhaps it will eventually spur Google into offering a commercial ‘Deep’ Web-wide version of the Custom Search Engine:— full-text deep indexing of all the documents found at any website it’s pointed at; all the documents found are drawn on to produce your custom search results, every time; and the user gets 12,000 URLs to play with. Or perhaps Microsoft Bing will offer such a service. It might be limited to non-profits, so as to keep the SEO spivs out.