• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Category Archives: Academic search

OA search, group test: mongolian folk song

06 Sunday Dec 2015

Posted by futurilla in Academic search, My general observations

≈ 7 Comments

This post is another in my series of group tests. It tests public tools used for searching across open access (or otherwise free) academic papers, theses and/or books.

I decided to re-visit the search: mongolian folk song. It’s not a sophisticated search, but rather the sort that an undergraduate might initiate. JURN’s first ever group test, way back in 2009, used the same keywords. That old first test was perhaps rather hastily dashed off, but let it stand. Readers may be able to judge something of the progress (or not) made by search tools during the last six years, by roughly comparing the 2015 results with those of 2009.

Note that I’ve also added some new search tools in this test, new since the last group test in July. The additions are CHORUS, SHARE, Q-Sensei, WorldWideScience, NDLtd, SciLit and EThOS.

One especially worrying conclusion is the penetration of articles from certain questionable publishers into several search services. Read the test for details.


Search: mongolian folk song

JURN group test: mongolian folk song
 
December 2015. Searching for free full-text academic articles, book chapters, dissertations/theses or other substantial content in English. I clicked through on possible results and evaluated.
DOAJ 0   Used ‘Article’ search. 0 from zero results.
JournalTOCS 0   Zero from zero results. I’m not sure why the JournalTOCS search is consistently at zero, when the service is otherwise rich in article titles and abstracts. The same non-results were had with several different Web browsers, so it’s not my browser blocking some script. A Google site: search across JournalTOCS suggest that at least one article in the Journal of Ethnopharmacology might have appeared.
Paperity 0   Checked first 25 results. All medical/science results, with no result being an ethnographic or musicology paper.
JournalSeek 0   Zero results, from four results. To be fair I should mention that JournalSeek is meant to find journals themselves, rather than their articles.
British Library EThOS 0   0 from zero results.
Q-Sensei 0   0 from six results, all results went to bibliographic records with no fulltext.
Microsoft Academic 0   0 from zero results.
CHORUS –   The new beta search tool heavily supported by the major commercial publishers, intended to be a “Clearinghouse for the Open Research of the United States”. Currently allows searches only by “Funder name”. Therefore, it was not able to be tested. An earlier solo test suggested that many paywalled articles are being included in CHORUS.
SHARE 0   This is the beta of a major well-funded U.S. search tool which eventually aims to become a comprehensive search engine for the world’s repositories. Currently JURN covers the fulltext in about 90% of the same repositories. SHARE showed a promising ability to focus results only on Mongolia, and to avoid semantic entanglements with other national folk musics or with the historic SONG dynasty of China. But the results were all from science, geology, natural history or agriculture. The top three results were from Mongolian Journals Online which currently hosts five open journals. Mongolian Journals Online is not yet in JURN, but will be added soon (it was not added to JURN during/for this test).
Ingenta Connect 0   Zero from one result. One result did appear promising, despite a somewhat misleading title. It detailed an expedition through the Mongolian tablelands which had combined anthropological folksong gathering with attention to local people’s tacit oral knowledge of subtle ecological changes to their landscape — but sadly this article proved to be paywalled for $25 by publisher Peter Lang.
Mendeley 0   Searched ‘Articles’ only, then filtered for Open Access articles only. The first ten results kept some focus, but then dissipated into articles on bird song and humpback whale song. Only two results, on ethnobotanical knowledge among Mongolian nomadic herders, came close to the topic.
CORE 0   Filtered by full-text only. Looked at the first 50 results. CORE’s semantics are obviously still going far astray when presented with this type of wide search. But at least there was no confusion with the Chinese personal name SONG, and halfway through the results there was a detectable but weak focus on Tibetan songs. The first result had a broken link to its fulltext.
OAlib 0   OAlib gave a jumble of general results for various national and regional folk songs, even straying into articles on bird song for a few results. But there was perhaps some attempt at results ranking going on, since a few highly ranked results were to articles on Russian and Turkish folksong. By the third page of results it was picking up articles on the “SONG Dynasty” of China, and also offered “A Review of Mongolian Herbal Medicine” (2008), by author Lin SONG. The latter proved to have a broken link to the fulltext. OAlib’s results now show a sidebar ad for an OALib Journal, with a $99 submission fee. I note that OALib Journal is not a journal included in the DOAJ.
Google News 0   Google News can be surprisingly useful for triangulating contemporary aspects of one’s search topic. It is able to surface reviews, obituaries and journalistic articles from the last six months to a year. A searcher must force verbatim, by enclosing a word in quote marks, otherwise results are very poor. “Mongolian” folk music was thus used for this test. Results revealed that a Mongolian folk metal band, Tengger Cavalry, is currently on a 12-date tour of the west — an event which has spurred short perceptive profiles by music specialists in Chicago Reader and Noisey. The South China Morning Post declares (rather belatedly, since I was listening to them years ago) that in the west, “The next big thing: Hanggai, Mongolian folk rockers…” and the Canadian National Post has a short feature article on Hanggai (plus video) from July 2015 when they toured Canada. ECSN offers a profile of the award-winning Mongolian folk band Haya, and elsewhere the paper briefly claims that the Inner Mongolia Autonomous Region is racing to record the tradition among older people there… “interviewing and recording performances by the inheritors since September [2015], prioritizing those above 70 and the sick.” Xinhau has published online an English translation from the China Daily of the article “Mongolian man on a musical mission”, a profile of young folk musician Bodee Borjigin who is in training at the Berklee College of Music (USA). However, none of these items were judged to be quite substantial enough to count as a ‘hit’ for the purposes of this test.
WorldWideScience 1   The second result, “On the Mongolian Folk Drawling Song” was a concise and readable overview from 2014, by a Mongolian senior lecturer writing in English for a small Japanese journal. A link to “Disappearing Horchin Mongolian Narrative Songs” at Cambridge proved to have the fulltext locked down behind a password box. Other results were either for plain records or were wildly off-topic.
PQDT Open 1   One result, from 50 results (checked all 50). This search tool is specifically meant to find open theses and dissertations. Found “Mongolian folklore expressed through music technology original multimedia soundtrack “On Horseback””, which was a very short 2014 Masters dissertation meant to accompany and describe the making of a multimedia production.
NDLtd 1   Checked first 50 results. A jumble of general results for various national and regional folk songs, with a slight overall focus on China (where NDLtd appear to be based?). Result no.1, “Folk songs in telugu and kannada”, was a broken link. “Dali and the Song-Mongolian war” was an account of the role of the Dali people in a dynastic Chinese war. Result no.11 was a hit (same as the PQDT Open result, above). A bare record for “A STUDY OF THE MONGOLIAN LAO KIDA” proved to be a broken link. A Hathi link for “Imagining the Chinese tradition: the case of Hua’er songs, festivals, and scholarship” was to a cursory record page with no fulltext. Down at result no.50, “The Mongolian Prairie Dizi’s Music Culture Research” appeared to be in Chinese and anyway had no fulltext available.
OATD 2   2 from two results. The first hit was the same result as at PQDT Open (see above). The second was excellent and contemporary, a substantial four-year 2011 PhD by a Chinese student at the University of Maryland, “Chasing the Singers: The Transition of Long-Song (Urtyn Duu) in Post-Socialist Mongolia”.
OpenAIRE 2   Filtered for “English” and “Open Access”, checked first 50 results. The first result, to “Disappearing Horchin Mongolian Narrative Songs” at Cambridge, proved to have the fulltext locked down behind a password box. “Videofiberoptic laryngeal data and acoustic analysis of the ornamentations used in Mongolian Long Song” proved to be only a conference paper abstract, as did a similar prospect later in the results. “A Comparison of the Image of Home in Mongolian Horqin Lyric Folk Songs and American Cowboy Songs, and “From Gada Merin to Jesse James: A Comparative Study on the Image of Heroes in Mongolian Horqin Folk Songs and American Western Cowboy Songs” both came from the Canadian Center of Science and Education (see the test for SciLit, below, for details on this publisher). There was also an unfortunate crop of off-topic medical results from the early part of the 20th century, e.g. “Brain of a Mongolian Imbecile” (1916) and “Defective Children of Mongolian Type” (1901).
Digital Commons Network (BePress) 3   104 results for the first pass, none seeming to be useful. I then filtered by “Arts and Humanities”. This gave 26 results, including: “Across the Red Steppe: Exploring Mongolian Music in China and Exporting it from Within”, a Masters dissertation; the dissertation-like graduate-school fieldwork report “New Representations of the “Golden Lineage”: The Mongolian Folk Rock of Altan Urag”, part of the Independent Study Project (ISP) Collection at the University of Pittsburgh; and a Masters dissertation “Norovbanzad’s Legacy: Contemporary Concert Long Song in Mongolia”. Switching over to the remaining disciplinary filters gave no further hits for this test.
BASE 3   I chose the option to “boost open access documents” and to do a verbatim search. I had 25 results, noting that some content was labelled as drawn partly from PQDT Open. Most results proved to be fieldwork videos of singers for the World Oral Literature Project which, while valuable, could not be counted for this test. One item was spurious, a 1945 newspaper article from Proviso Township in Illinois. Result No.3, “Wikibooks: Traditional Chinese Medicine” from Wikipedia, might also be deemed questionable given its medical nature. Three results were found to be valid and to have fulltext available: the short Masters dissertation “Mongolian folklore expressed through music technology original multimedia soundtrack “On Horseback””; “On the Mongolian Folk Drawling Song”; and the excellent “Chasing the Singers: The Transition of Long-Song (Urtyn Duu) in Post-Socialist Mongolia”.
SciLit 3   Three from three results: “A Comparative Study of Mongolian Folk Songs Based on the Breathing Signals of Physiological Mechanism”; and two comparative studies, “A Comparison of the Image of Home in Mongolian Horqin Lyric Folk Songs and American Cowboy Songs”; “From Gada Merin to Jesse James: A Comparative Study on the Image of Heroes in Mongolian Horqin Folk Songs and American Western Cowboy Songs”. While at first glance these seem like useful contributions, one should be careful to note the names of the publishers found via SciLit — in this case either the Canadian Center of Science and Education or the commercial China based conference organiser and journal publisher Atlantis Press. Beall notes, re: his 2015 list, that “I recommend that all researchers avoid publishing in all the journals published by the Canadian Center of Science and Education” and I note here that the CCSE’s journals are not indexed by either JURN or the DOAJ. Nor are the Atlantis Press journals in the DOAJ or JURN.
Google Search 4   Used a Web browser not signed in to Google, and used a URL that told Google to draw results from its complete index. Put “Mongolian” in quote marks, to try to force verbatim. Checked the first 50 results. Top four results were valid short video clips, followed by “Music of Mongolia” on Wikipedia. Result no.7 was a very short UNESCO page “Mongolian Traditional Folk Long Song”. Result no.10 was “Contemporary trends in the Mongolian folksong tradition”, a one-page newsletter article by a Mongolian ethnomusicologist and lecturer at Kent State. The status of the author was weighted against the shortness of the article, and this was judged a ‘hit’. The second page included the Atlantis Press paper “A Comparative Study of Mongolian Folk Songs Based on the Breathing Signals of Physiological Mechanism” — see the SciLit test (directly above) for details on Atlantis Press. The third page yielded the article “Blue Heaven, Parched Land: Mongolian Folksong and the Chinese State” on Academia.edu. By the fifth page, the ubiquitous .mp3 spam/virus sites were starting to creep in, but among these was the valid “On the Mongolian Folk Drawling Song” academic article.
OPENDoar 5   Examined first 50 results. Valid and in fulltext on the first page were: “On the Mongolian Folk Drawling Song”; the thesis “URTIIN 
DUU: PERFORMING 
MUSICAL 
LANDSCAPES AND THE 
MONGOLIAN
 NATION” (not seen, before that point), the Masters dissertation “Moving Melodies: Contemporary Music Culture of Mongolian Nomads and Opportunities for Contextualization”; the thesis “Chasing the Singers: The Transition of Long-Song (Urtyn Duu) in Post-Socialist Mongolia”. Further pages had duplicates of earlier articles, only adding “The Negotiation of Minority Identities and Representation in the Independent Music Scene of Urban China”. This was only a short conference paper, but it was judged to be useful background on the tradition’s negotiations with modernity and with other ethnic traditions.
Google Scholar 6    Searched on “mongolian” folk song to force a partial verbatim search, and checked first 50 results with citations excluded. The no.1 result was the spurious result that had seemed to pop up everywhere in 2009, W.E.B. Du Bois’s book The Souls of Black Folk (1903). This was the only instance of the book’s appearance in this 2015 test. “Blue Heaven, Parched Land: Mongolian Folksong and the Chinese State” was in fulltext at no.5. Both “”Performing identity through language: The local practices of urban youth populations in post-socialist Mongolia” and “Mongolian oral epic poetry: An overview” were judged to be background articles, but looked useful enough to be counted as hits. The survey “Ethnic minorities in Chinese films: cinema and the exotic” was judged a little too tangential to be a hit. Further hits were: “A Comparative Study of the Singing Styles of Mongolian and Tibetan Geser/Gesar Artists”; “New Representations of the ‘Golden Lineage’: The Mongolian Folk Rock of Altan Urag”; and “A Comparative Study of Mongolian Folk Songs Based on the Breathing Signals of Physiological Mechanism” (Atlantis Press, again).
FreeFullPDF 7   7 from 17 results. Result no.1 was the article from Atlantis Press, and no.3 and 6 were from the Canadian Center of Science and Education (see item on SciLit, above, for discussion on why these publishers are questionable). Seven were counted as valid hits, including the articles from these publishers.
JURN 19   Looked at first 50 results, searched for “mongolian” folk song to force partial verbatim. Results and hits (in bold) are given below. Again a key finding is that JURN is now large enough to provide results through to result No.100. So, given a well-formed search, people who are habituated to just look at the first ten results in Google should explore the full set of 100 results in JURN. Searches should ideally be specific and detailed in JURN, rather than three keywords.

JURN results:

1. “Contemporary trends in the Mongolian folksong tradition of Urtyn Duu”.

2. “Blue Heaven, Parched Land: Mongolian Folksong and the Chinese State”.

3. “On the Mongolian Folk Drawling Song”.

4. “New Representations of the ‘Golden Lineage’: The Mongolian Folk Rock of Altan Urag”.

5. Abstract for “Blue Heaven, Parched Land: Mongolian Folksong and the Chinese State”.

6. “Survival of the Fittest: The Urtyn Duu Tradition in Changing Mongolia” (Smithsonian Folkways magazine)

7. “URTIIN 
DUU: PERFORMING 
MUSICAL 
LANDSCAPES 
AND THE 
MONGOLIAN
 NATION”.

8. Academia.edu folder for articles in ‘Mongolian folk/traditional music’. (None were in English).

9. “A Comparative Study of the Singing Styles of Mongolian and Tibetan Geser/Gesar Artists” (Journal of Oral Tradition).

10. Contemporary trends in the Mongolian folksong tradition of urtyn duu (duplicate).


11. “Chinese and Western elements in contemporary Chinese composer…” (Chinese composer who mixed various traditions including Mongolian)

12. “Chasing the Singers: The Transition of Long-Song (Urtyn Duu) in Post-Socialist Mongolia”.

13. “Red Sonic Trajectories – Popular Music and Youth in China” (brief speech made in Amsterdam, 2001).

14. “Mongolian Folklore and perception of space” (A very brief article).

15. Academia.edu folder for articles in ‘Mongolian Folklore’. (Had “The Last Outstanding Mongghul Folksong Singer” from Asian Highlands Perspectives journal).

16. “UNESCO’s World of Music”, Smithsonian Folkways magazine. Brief mention of a reissue by Hungaroton/UNESCO of the album Mongolian Folk Music, “recorded by Lajos Vargyas in 1967 when Mongolia was still closed off” and which preserved for the first time the overtones in the voices.

17. Ethnomusicology OnLine review of the album Living Music of the Steppes: Instrumental Music and Song of Mongolia.

18. “Mongolian Oral Epic Poetry: An Overview”. (Tangential, but useful background).

19. “Mongolian foreign policy: the Chinese dimension.”

20. “A Comparative Analysis of Eurasian Folksong Corpora.”


21. Academia.edu folder for documents in ‘Mongolian Music’. (Lots of French articles and also “The Cuur as Endangered Musical Instrument of the Urianxai Ethnic Group in the Mongolian Altai Mountains” / “The Camel and its Symbolism in the Daily Life of the Mongols with Particular Reference to their Folk Songs”).

22. “MA SI-CONG’S VIOLIN CONCERTO IN F MAJOR: WESTERN TRADITIONS AND CHINESE ELEMENTS” (Chinese composer who mixed various traditions including Mongolian)

23. “Urtiin Duu, traditional folk long song. Mongolia, China.” (Short descriptive entry and video from the UNESCO list of Intangible Heritage. Too short to be a hit.)

24. “Cultural policy in the Mongolian People’s Republic”. (Concise and factual UNESCO overview pamphlet from 1982. A little too tangential to count as a hit, and has probably been superseded by other texts).

25. “The Mechanisms of Epic Plot and the Mongolian Geseriad.”

26. “A SURVEY OF THE UNACCOMPANIED VIOLIN REPERTOIRE, CENTERING ON WORKS BY J.S. BACH AND EUGENE YSAŸE” (tangential, composer happened to use a fragment of a Mongolian song in one work)

27. “ORIENTATION OF BRONZE AGE MOUNDS IN MONGOLIAN ALTAI MOUNTAINS” (archeoastronomy)

28. Three page book review of Mongolian Music, Dance, and Oral Narrative: Performing Diverse Identities.

29. ‘On Huaer’ and ‘Selections of Traditional Qinghai Folk Songs’ (Three page book review of these 1980s titles. The collectors are berated, in passing, for ignoring Mongolian songs).

30. Academia.edu documents in the folder ‘Buryat Folklore’. (1970s “Literary translation of three Mongolian and two Buryat shaman songs, one historical folksong and one horse praising song with notes and illustration.” — but sadly not in English).


31. “A Musical Map of Different Turkic-Speaking Peoples as based on Field Work from 1936 until the Present”.

32. Academia.edu documents in the folder ‘Mongolic languages and dialects’ (Nothing relevant in English)

33. “Conclusion: Voice and Persona” (Conclusion and bibliography of a thesis on Chinese popular music)

34. “Mongolian Music, Dance, and Oral Narrative: Performing Diverse Identities” (Book review in Journal of Folklore Research)

35. “The Mechanisms of Epic Plot and the Mongolian Geseriad”.

36. “UNESCO: Eight new elements inscribed on List of Intangible Heritage in Need of Urgent Safeguarding”. UNESCO’s listing in 2011 of: “Folk long song performance technique of Limbe performances – circular breathing, Mongolia: The Limbe is a side-blown flute of hardwood or bamboo, traditionally used to perform Mongolian folk long songs. … only fourteen individual Limbe practitioners remaining.” (One paragraph, to slight to be a hit)

37. English review of the book Mongolische Erzdhlungen iiber Geser, Neue Aufzeichnungen. Review of a German translation via Russian of the product of “two [Russian epic folksong collecting] expeditions to Mongolia in 1974 and 1976, expeditions which were organized by the Academy of Sciences of the People’s Republic of Mongolia.” Followed by a review of the German 1985 book Mongolische Epen XI, with very close attention paid by one of the then-experts in the field.

38. Academia.edu documents in the folder ‘Oirat History’. Tangential result: “Yurts [tents] in Be si chung, a Pastoral Community in Amdo: Form, Construction, Types, and Rituals”, and “The Khotons of Western Mongolia” (1979).

39. “Life on the divide: the Buriad people and the world’s longest border” (University of Cambridge research article from 2013, describes “A major project – Where Rising Powers Meet – looks at life along the border that separates Russia, China and Mongolia.” Passing mention of the Buriad folk song tradition).

40. “Hawaii Chinese Dancing and Songs Theatre”. Paragraph of biography for “Juan Huang […] traveled extensively within China [in the 1980s], collecting the folk dance forms of many of China’s 56 distinct ethnic groups.” She ended up in Hawaii in 2003 where she founded and ran the Hawaii Chinese Dancing and Songs Theatre troupe.


41. “Landscape in Words: The Natural World in Mongolia Folk Literature and Contemporary Poetry”. (One page research project summary by a 2005 Fulbright Scholar).

42. “Art, Ritual, and Representation: An Exploration of the Roles of Tsam Dance in Contemporary Mongolian Culture”.

43. “Landscape in Language: Representations of Homeland in Mongolian Magtaal and Song” (in American Center for Mongolian Studies newsletter, Spring 2006)

44. “Dream and Sacrifice” (profile of composer Kimmo Hakola, who happened to use a Mongolian song fragment in a work)

45. “The Mongolian Big Dipper Sutra” (Journal of the International Association of Buddhist Studies. Discovery that a Mongolian epic song “preserves a Chinese Buddhist text for the worship of the seven stars/Buddhas of the Big Dipper [star constellation] that is not found in the Chinese canon.”)

46. Academia.edu documents in the folder ‘Mongolian History in Qing period’. Possibly of tangential interest would be “Religion and Mongol Identity in the mid-19th Century Urga. On the Basis of a Mongolian Monk’s Oral Narratives Recorded by Gabor Balint of Szentkatolna in 1873”.

47. “Concept Paper for an Inner Asian and Mongolian Studies Collaborative Online Reference Guide”.

48. Academia.edu documents in the folder ‘Mongolian History in Qing period’. (Has “The Melodic System of Pentatonicism (A sketch about the Mongolian version)” and “”Voices that Soar like Wind Through the Mountains”: Mongol-European Hybridity, Ecomusicology and Compassion in Urtyn Duu Long-Song”).

49. Bibliography for the works of Alexandra Arkhipova, at the Centre of Typological and Semiotic Folklore Studies, Russian State University for the Humanities.

50. Duplicate of no.49.


Skipping lightly through the next 50 results (JURN provides 100 results), one can easily note a number of background items such as:

* “The cultural anthropology of the Sino-Mongolian frontier”.

* “Folklore and Folklife of Central Asian Women”.

* UNESCO 1983 overview booklet “Cultural policy in the People’s Republic of China: Letting a hundred flowers blossom”.

“Nah, that’s not a dead PLEIADI… it was just resting…”

03 Thursday Dec 2015

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

The search tool for Italian repositories, PLEIADI, has now returned. It must have been taken down for a big overhaul, when I spotted it had gone 404 earlier this week. It’s now a very usable and neatly re-designed portal, enabling search across the 100 or so Italian open access repositories. In the new PLEIADI a searcher can even filter to show only Open Access content and filter for English language results. Very nice, including over 5,000 open access books…

pl-returns

oaital

The Academia.edu advantage

23 Monday Nov 2015

Posted by futurilla in Academic search, Open Access publishing, Spotted in the news

≈ Leave a comment

Post your articles to Academia.edu as soon as they’re published, get more citations….

Based on a sample size of 34,940 papers, we find that a paper in a median impact factor journal uploaded to Academia.edu receives 41% more citations after one year than a similar article not available online, 50% more citations after three years, and 73% after five years. We also found that articles also posted to Academia.edu had 64% more citations than articles only posted to other online venues, such as personal and departmental homepages, after five years.” [the conclusion expands this “other” element, it includes: “journal site, or any other online hosting venue”]

The studied papers were uploaded at “the same time they’re published”. Excluded from the study were… “articles uploaded to Academia.edu after they were published”.

Amazingly, the authors also note that…

To our knowledge there has been no research on what features of open access repositories or databases make articles easier to discover”

All that public money spent on repositories around the world, and not one librarian has felt the need to test for such public discoverability vectors? Seriously?

oaFindr

22 Sunday Nov 2015

Posted by futurilla in Academic search, Open Access publishing, Spotted in the news

≈ 1 Comment

A new Canadian commercial start-up is offering its new oaFindr service, with free / low-cost trials for university libraries. oaFindr is said to be able to explore a library’s existing journal subscriptions, and to identify just the open access articles within the hybrid journals. According to the press release oaFindr…

… enable[s] academic institutions to analyze their journal subscriptions and provide[s] them with a reliable, precise search and discovery tool to retrieve all open access articles. This solution will also help them comply with governmental open access mandates, and support them in rapidly increasing the diffusion of their institutions’ scholarly production in a manner that is much less labour-intensive”

The idea appears to be that the discovered OA articles are then harvested and passed to the company’s related oaFoldr service, with oaFoldr providing a conduit into their hosted repository for the OA articles. Nice if it works and gets adopted and, if public, it would provide a welcome new mega-repository for Google and JURN to index. Alternatively, I suppose that the oaFoldr may just be a private folder for cataloguers, in which the articles reside before being placed into the university’s own repository. More likely to be the latter, since otherwise one commercial company could potentially get to corral the world’s OA article output in its own repository, and would then be in a position to sell it back to universities via an enhanced search and mining/metrics service.

Regrettably, as Bernard Rentier observes, mass extraction and archiving of 1000s of OA articles per month from commercial databases may not be welcomed by the big publishers…

Elsevier has designed a way to prevent researchers from mass-downloading articles from its website where they are so-called open access…”

So how would universities harvest efficiently? Bear in mind that commercial licenses may also prevent a university from taking the proprietary hybrid journal metadata from the likes of Elsevier, Springer, Oxford etc, along with their OA fulltext PDFs. So I guess it’s much more likely that each institution will play safe and harvest only PDF articles by their own researchers, thus giving a much lower harvesting volume that might not trigger download blocking. And that they’ll find ways not to take any metadata generated around the OA article by publisher databases.

I wonder if some large institutions may have to harvest articles via spoofing multiple ‘student’ accounts? Or is oaFindr itself pre-harvesting OA PDFs from hybrid journals and then vending them to institutions along with metadata? Probably not, or the big publishers would likely be throwing lawsuits at the company. oaFindr seems more likely to be a sort of super-Paperity, but covering all hybrid titles from the big publishers plus all the DOAJ titles at the article level. I’m guessing a lot here, or course, but if such a service works then it would be rather cool. Though probably lacking in things like Google-strength semantics and relevance ranking.

So let’s assume that the university libraries are the ones that do the work of harvesting OA PDFs for their repositories. OA mandates and the consequent exponential growth of OA articles may still lead to the hitting of a ‘mass downloading’ roadblock in the near future, even at a university which restricts itself to its own outputs and/or harvests fulltext via multiple accounts. Big publishers might even change their database small-print, so as to forbid ‘type targetted’ mass harvesting leading to local storage of articles.

I guess one solution would then be to rely only on having repository records + Web links to the fulltext (fulltext hosted back on the journal’s website). Though that assumes that links don’t break. Which they do, and at a horrendous rate.

In the end I suspect it may just be easier for a university to go after its research staff with pitch-forks, and literally force them to upload their OA papers to the university repository. If your new paper isn’t in the repository after 28 days, then your next month’s salary gets docked 20% and your department can’t apply for any new funding or external partnerships in the next six months. That sort of thing.


Update, Nov 2017: OAFindr is now called 1Findr.

Semantic Scholar

03 Tuesday Nov 2015

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

Another month, another search-engine for the well-thumbed corpus of academic articles in Computer Science. Semantic Scholar is a touch different though, as it’s been developed at the Paul Allen Institute for Artificial Intelligence and it just searches 3 million open access papers. As such I guess that most Computer Science students may come to think of it as just a much more elegantly designed and somewhat faster equivalent of Microsoft Academic, minus the pesky records with no PDF links.

Semantic Scholar reportedly plans to expand to the neurosciences and biomedical by 2016-18. And, of course, one should never underestimate the Microsoft tortoise/hare growth method (Allen is a Microsoft founder) — what looks like a lackluster tortoise at first slowly builds and redefines, and re-builds and expands again over the years, until suddenly it’s out in front of the race. That process stalled with the reported ceasing of further development on Microsoft Academic, but it may be that Semantic Scholar is effectively Microsoft’s arms-length second try at that? Just my guess.

As with most such ventures, it seems to be cloaking the allegedly A.I. / semantics-assisted development of something far more commercial and widely applicable: accurate automatic full-text detection (CORE could only get to around 27% with that on academic repositories, last I heard), then document structure evaluation, extraction, segmentation and re-formatting. Which is nice, if one only has to organise an interface for a very well-behaved corpus of Computer Science papers. Semantic Scholar certainly looks like it can do that, and elegantly too, though I’m not qualified to comment on its relevancy ranking or the alleged semantics aspects. But I suspect we’re still many decades from having an autobot that can tame the messy Wild West of open publishing in that manner.

Google Scholar and grey literature

28 Monday Sep 2015

Posted by futurilla in Academic search, JURN's Google watch, Spotted in the news

≈ Leave a comment

Interesting new paper at PLOS One, “The Role of Google Scholar in Evidence Reviews and Its Applicability to Grey Literature Searching”.

Test searches were drawn from review papers…

“…chosen as they covered a diverse range of topics in environmental management and conservation, and included interdisciplinary elements relevant to public health, social sciences and molecular biology.”

… and compared alongside Web of Science results…

Surprisingly, we found relatively little overlap between Google Scholar and Web of Science (10–67% of WoS results were returned using searches in Google Scholar using title searches).

Unsurprisingly, Google Scholar wasn’t found to be the one-stop shop many assume it to be…

… some important evidence was not identified at all by Google Scholar … [so it] should not be used as a standalone resource in evidence-gathering exercises such as systematic [literature] reviews.”

Interesting finding also that…

“Peak” grey literature content (i.e. the point at which the volume of grey literature per page of search results was at its highest and where the bulk of grey literature is found) occurred [in Google Scholar] on average at page 80 (±15 (SD)) for full text results … page 35 (± 25 (SD)) for title [search] results.”

So this suggests that one might usefully flick through to result 700 (of 1000) and work a few hundred results starting from there, if seeking grey literature with a very well-formed topic search? By well-formed I mean the sort of sophisticated literature-review style of search term chaining being used in this study, for example…

“oil palm” AND tropic* AND (diversity OR richness OR abundance OR similarity OR composition OR community OR deforestation OR “land use change” OR fragmentation OR “habitat loss” OR connectivity OR “functional diversity” OR ecosystem OR displacement)

It appears that the researchers only auto-extracted “citation records” from the search results, and then classified into broad categories based on those alone. There appears to have been no checking as to the validity of the link, and/or downloading and scrutiny of PDFs. So there are no measurements of how many of Google Scholar’s links work or lead to free no-paywall fulltext articles.

Lastly, I noted…

Google Scholar has a low threshold for repetitive activity that triggers an automated block to a user’s IP address (in our experience the export of approximately 180 citations or 180 individual searches). Thankfully this can be readily circumvented with the use of IP-mirroring software such as Hola (https://hola.org/)”

What’s in Common Crawl?

06 Sunday Sep 2015

Posted by futurilla in Academic search

≈ 1 Comment

The Common Crawl service is an open monthly crawl of the Web. It currently weighs in at a whopping 145TB, and is seemingly limited to Web sites with high-ranking inbound links.

How does one discover if an URL is being crawled for Common Crawl? With the Common Crawl Index, an URL lookup tool for Common Crawl. It’s then apparently a fairly easy thing to work down a tree for identification and extraction of, say, just the Wikipedia index segments from the monthly crawl.

A test with some randomly selected (and rather more obscure than otherwise) JURN URLs suggests that Common Crawl does indeed visit a wide range of URLs. You can see what actual pages on an URL are being indexed by Common Crawl, by replacing the *.URL inside this…

http://index.commoncrawl.org/CC-MAIN-2015-06-index?url=*.jprstudies.org&output=json

In the above instance, the crawler doesn’t appear to going very deep into the heaving bosom of Journal of Popular Romance Studies. To create a version of JURN on Common Crawl the crawler would need to be told to explicitly do a deep harvest on each URL, rather than only collecting pages with high-ranking inbound links. That’s my guess, and it might explain the sparse harvest of jprstudies.org (see above). This guess seemed to be confirmed, when I found a Common Crawl forum comment in August 2015 by Tom Morris…

…given a fixed budget, focusing on crawling entire domains, whether by using sitemaps or other means, will, necessarily, reduce the number of domains which are crawled. Focusing on crawling all structured product data will mean sacrificing crawling popular pages.”

So the JSON output for the above link effectively tells you what your domain’s most popular pages are, as judged by inbound links from quality sources. That, in itself, may be rather useful for some.

What about PDFs? It seems that some PDFs are collected and indexed alongside the HTML. Not many PDFs seem to make it into the index, though. For instance, another forum comment showed a table for the March 2015 crawl, which had 3,111,864 PDFs against 1.6Bn HTML pages. The PDFs that do make it in often appear to be truncated.

“SFX Miscellaneous Free Ejournals Target”

19 Wednesday Aug 2015

Posted by futurilla in Academic search, Economics of Open Access, Spotted in the news

≈ 1 Comment

“SFX Miscellaneous Free Ejournals Target: Usage Survey Among the SFX Community“, Serials Review (2015), 41(2), pp. 58-68.

SFX is an OpenURL link resolver product for university libraries, focussed on the output of traditional publishers — of which 16-20% is apparently so dodgy in terms of quality that it breaks the system. Yet, rather amazingly, it appears that much of this 16-20% is still allowed to get to the point-of-use.

The article briefly surveys recent findings on how SFX copes with open access articles, and then the rest of the paper gives the results of a survey of librarians who integrate a specific ‘free’ section of SFX with their library discovery tools. It appears that scholars looking for open free full-text via SFX can expect way over 20% dead link errors on URLs…

… one category [of failure] (incorrect parse params) alone leads to 20% false positives (dead links) for MFE [the largest ‘free’ target in SFX]. Besides incorrect parse params, there are numerous other reasons for the occurrence of false positives (dead links), such as resolver translation error, inaccurate embargo data, provider target URL translation error, incomplete provider content, wrong coverage dates, indexed-only titles mistakenly considered as fulltext titles, and other reasons listed in the literature review section.”

So that might mean… perhaps 40% of links to open access full-text are dead? Or even more, like… 60%? The article doesn’t hazard a guess.

The DOAJ ‘targets’ are apparently not much better…

It’s an irony that I find discovery services generally have much poorer coverage of Open Access than Google Scholar. … Most discovery services have indexed DOAJ (Directory of Open Access Journals), but many libraries experience such a bad linking experience they just turn it off” — Aaron Tay, July 2015.

I’m pleased to say that JURN should have close to zero dead links on standalone journals, due to the way it is set up. JURN may lead to a few fleeting “server maintainance” / “timeout” errors here and there, but if the journal’s base URL for articles moves then its articles effectively get auto-removed from JURN’s results. But they get found again within a year at most, through an effective two-pronged method.

Google Scholar’s advantages

29 Wednesday Jul 2015

Posted by futurilla in Academic search, JURN's Google watch

≈ Leave a comment

A new blog post from Aaron Tay, “5 things Google Scholar does better than your library discovery service”, looking at the huge market advantages enjoyed by Google Scholar. The main points in summary:

* Intake and update: Google intakes, refreshes and updates very quickly.

* Automated detection: The Google bot spots and indexes academic articles wherever those are located.

* Relevancy ranking: It’s certainly not perfect, but is vastly better than anyone else’s.

* Clear and fast: Simple interface, a few useful widgets and filters. Additional features are accessed only via typed-in search modifiers or the well-hidden “Advanced” form.

* Cross-platform: Scholar can be tweaked to become a seamless gateway into paid subscription services.

I would also add…

* De-duplication in results. Not always perfect, not always even seen by the end user, but pretty intelligent.

Microsoft Academic Graph

19 Sunday Jul 2015

Posted by futurilla in Academic search, Spotted in the news

≈ Leave a comment

Microsoft Academic Graph…

“The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals and conference “venues” and fields of study. This data is available as a set of zipped text files … The file size is ~37GB.”

← Older posts
Newer posts →
RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • February 2026
    • January 2026
    • October 2025
    • May 2025
    • April 2025
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.