The Amazon Public Data Sets service offers free data storage for useful public domain Big Data sets, on Amazon’s uber-servers. Analysis tools too, it seems.
Amazon Public Data Sets
20 Thursday Feb 2014
20 Thursday Feb 2014
The Amazon Public Data Sets service offers free data storage for useful public domain Big Data sets, on Amazon’s uber-servers. Analysis tools too, it seems.
20 Thursday Feb 2014
Posted Spotted in the news
inThe Common Crawl bots spent 2013 crawling the Web. Now their 102TB / 5 billion page index is available to anyone who wants it. For free. Re-use it freely too, on what is effectively a CC0 licence.
20 Thursday Feb 2014
Posted Spotted in the news
inElastic Search 1.0 has just released. It’s a search-engine, open source, with four years of development under the hood. Aimed at big businesses, but free. Currently trying to swing itself around to head toward the gap that’s looming due to the shortage of human Big Data analysts. InfoWorld magazine has a straightforward overview.
20 Thursday Feb 2014
Posted JURN's Google watch
inIn an unusual move Google appears to have created its own Custom Search Engine, Custom Search for K-12 Computer Science Education. For the benefit of those outside the USA, “K-12” isn’t the name of some obscure Linux module. It seems to be U.S. educational jargon indicating: “state schooling for kids aged 5 to 16”.
14 Friday Feb 2014
Posted Spotted in the news
inWeb hyperlinking to freely available online content does not amount to publishing an illegal communication, says a major new ruling by the Court of Justice of the European Union. Linking to open content cannot communicate it to a new public audience, since the content is already public.
However, the ruling says nothing of the taxing of Web links to open content, a loopy idea proposed by French socialists among others.
11 Tuesday Feb 2014
Posted Spotted in the news
inThe US Defense Advanced Research Projects Agency (DARPA) want to build Google-style custom search engines for spooks, or as they phrase it… “domain-specific indexing of web content [with] domain-specific search capabilities”. Might be nice if they could also invite worthy non-profits like JURN to park up on their uber-servers for free, with maybe 10,000 domain URLs to play with — compared to the 5,000 URL limit Google places on its CSEs.
06 Thursday Feb 2014
Posted Spotted in the news
inTED has the full set of statistics from Coursera, one of the leading MOOCs…
05 Wednesday Feb 2014
Posted Ooops!
inGoogle Search has effectively just de-indexed most of the DOAJ, due to the DOAJ’s recent switch to a new dynamic results approach which did away with 1.4m Google-friendly static HTML pages.
Just before Christmas: 1.4m pages, including individual article records…
Today: 16 pages indexed…
03 Monday Feb 2014
JURN is now five years old, having launched in early alpha form with just 951 titles on 3rd February 2009. The current headline total of 4,690 titles works out at an average growth of around 750 titles per year, although in the calendar year of 2013 this had slowed to indexing around 350 new English language titles. However, the 350 figure was from my simple tallying from the “new titles added” blog posts — and this blog doesn’t report additions of non-English titles.
Actually, JURN’s headline total is probably an undercounting, since JURN can index nearly all French and Spanish language journals with a few “catch all” URLs for services such as Redalyc, Raco, Dialnet, and Revues. Also JStage in Japan. As their totals in humanities and arts steadily mount up, uncounted by me, so the total number of journals indexed by JURN automatically grows. The same is true of JURN’s use of single wildcard URLs that index all articles on a university’s dedicated open journal system (such as: http://ojs.library.dal.ca/*/article/
). These two factors mean that, if I were able to do a complete recount from scratch, the real headline figure for JURN would probably be well over 5,000 arts and humanities titles.
The centralised nature of science and biomedical meant that thousands of open journals in these areas could be added with little effort, and so they were experimentally included in JURN in late 2013 — although their numbers were not added to the headline total of journals indexed.
The Directory of over 3,000 titles published in English continues to grow.
JURN continues to be robustly maintained and repaired.
Overall usage of JURN continues to grow, although it would be nice to have a publicity professional or two to help more people become aware of the service.