Use MS Excel 2007 to split a long column / list into smaller chunks

How to use MS Excel 2007 to split a long column or list into smaller chunks, for later batch processing:

Real world scenarios: You have a simple but huge list that you want to parcel/email out in equal portions to various project participants. Or you are working with an old form-based system that can only process X amount of items at a time.

1. Get the excellent free ASAP Utilities plugin, install it in Excel. Note that you may need to enable it before its tab will appear (Top-left orb | Excel Options | Add-ins | Disabled Applications | ASAP + Go | Enable ASAP | OK)

2. Open a new sheet and paste your long list down into a single column.

3. In your new ASAP Utilities tab, click the Select button.

4. ASAP’s Columns and Rows | Select gives you a list of choices before it runs. Choose option 2 (“Conditional Row and Column Select…”) and then use the dialog box that appears. Here I’ve opted to have ASAP tell Excel to select every 25th cell…

25

No ‘Select’ button? Go: Options | Find and Run a Utility | Type ‘Select’ | Scroll down to “Conditional Row and Column Select…”.

5. Run Select, then exit the dialog box. The cells won’t immediately look like they’ve been selected. But if you Ctrl + C to copy them, then the familiar “marching ants” will reassuringly appear around the selected cells.

6. Now right-click your mouse anywhere inside your new group of selected of cells, and choose ASAP Utilities | option 18 “Insert before and/or after each cell in your selection…” In this new dialog choose “Insert after” and type {lf} to add a new blank line inside each of your selected cells.

7. Run the Insert process. It may take a minute to run, on a long list. Each selected cell will be given a double height by adding a line-break, thus…

paddedcell

If you just need to print out an Excel spreadsheet with each list-chunk separated by a space, perhaps so that your manager can easily read through the list in printed form, then you can leave the process there.

8. Some may now want to go further. When the whole column is selected and copied out to Notepad, you will see that the 25th, 50th, 75th etc cell will appear in quote marks “”, thus…

item 24
“item 25”
item 26

That’s kind of useful, but not really — since the primitive Notepad can’t handle multi-line search/replace.

However, simply paste the same list into the free open-source Notepad++ and the list copies as…

item 24
“item 25

item 26

9. That’s perfect. So now we just use Notepad++ to search all the occurrences and replace them with blanks. Then we have our list in chunks of 25 — each nicely separated by a blank line.

10. The neatly chunked list can now be pasted back into Excel, adding real blank cells between each chunked section. You might then add a comma to each blank cell, thus giving a basic comma-delimited .csv file for use with automated mailing-list software and similar.

Or the list can simply be saved out of Notepad++ as a plain .txt list, to work with manually — in clearly defined batches of 25 at a time.

The Academia.edu advantage

Post your articles to Academia.edu as soon as they’re published, get more citations….

Based on a sample size of 34,940 papers, we find that a paper in a median impact factor journal uploaded to Academia.edu receives 41% more citations after one year than a similar article not available online, 50% more citations after three years, and 73% after five years. We also found that articles also posted to Academia.edu had 64% more citations than articles only posted to other online venues, such as personal and departmental homepages, after five years.” [the conclusion expands this “other” element, it includes: “journal site, or any other online hosting venue”]

The studied papers were uploaded at “the same time they’re published”. Excluded from the study were… “articles uploaded to Academia.edu after they were published”.

Amazingly, the authors also note that…

To our knowledge there has been no research on what features of open access repositories or databases make articles easier to discover”

All that public money spent on repositories around the world, and not one librarian has felt the need to test for such public discoverability vectors? Seriously?

oaFindr

A new Canadian commercial start-up is offering its new oaFindr service, with free / low-cost trials for university libraries. oaFindr is said to be able to explore a library’s existing journal subscriptions, and to identify just the open access articles within the hybrid journals. According to the press release oaFindr…

… enable[s] academic institutions to analyze their journal subscriptions and provide[s] them with a reliable, precise search and discovery tool to retrieve all open access articles. This solution will also help them comply with governmental open access mandates, and support them in rapidly increasing the diffusion of their institutions’ scholarly production in a manner that is much less labour-intensive”

The idea appears to be that the discovered OA articles are then harvested and passed to the company’s related oaFoldr service, with oaFoldr providing a conduit into their hosted repository for the OA articles. Nice if it works and gets adopted and, if public, it would provide a welcome new mega-repository for Google and JURN to index. Alternatively, I suppose that the oaFoldr may just be a private folder for cataloguers, in which the articles reside before being placed into the university’s own repository. More likely to be the latter, since otherwise one commercial company could potentially get to corral the world’s OA article output in its own repository, and would then be in a position to sell it back to universities via an enhanced search and mining/metrics service.

Regrettably, as Bernard Rentier observes, mass extraction and archiving of 1000s of OA articles per month from commercial databases may not be welcomed by the big publishers…

Elsevier has designed a way to prevent researchers from mass-downloading articles from its website where they are so-called open access…”

So how would universities harvest efficiently? Bear in mind that commercial licenses may also prevent a university from taking the proprietary hybrid journal metadata from the likes of Elsevier, Springer, Oxford etc, along with their OA fulltext PDFs. So I guess it’s much more likely that each institution will play safe and harvest only PDF articles by their own researchers, thus giving a much lower harvesting volume that might not trigger download blocking. And that they’ll find ways not to take any metadata generated around the OA article by publisher databases.

I wonder if some large institutions may have to harvest articles via spoofing multiple ‘student’ accounts? Or is oaFindr itself pre-harvesting OA PDFs from hybrid journals and then vending them to institutions along with metadata? Probably not, or the big publishers would likely be throwing lawsuits at the company. oaFindr seems more likely to be a sort of super-Paperity, but covering all hybrid titles from the big publishers plus all the DOAJ titles at the article level. I’m guessing a lot here, or course, but if such a service works then it would be rather cool. Though probably lacking in things like Google-strength semantics and relevance ranking.

So let’s assume that the university libraries are the ones that do the work of harvesting OA PDFs for their repositories. OA mandates and the consequent exponential growth of OA articles may still lead to the hitting of a ‘mass downloading’ roadblock in the near future, even at a university which restricts itself to its own outputs and/or harvests fulltext via multiple accounts. Big publishers might even change their database small-print, so as to forbid ‘type targetted’ mass harvesting leading to local storage of articles.

I guess one solution would then be to rely only on having repository records + Web links to the fulltext (fulltext hosted back on the journal’s website). Though that assumes that links don’t break. Which they do, and at a horrendous rate.

In the end I suspect it may just be easier for a university to go after its research staff with pitch-forks, and literally force them to upload their OA papers to the university repository. If your new paper isn’t in the repository after 28 days, then your next month’s salary gets docked 20% and your department can’t apply for any new funding or external partnerships in the next six months. That sort of thing.


Update, Nov 2017: OAFindr is now called 1Findr.

Walk the British Museum in Google StreetView

Oh, how wonderful. Now you can walk the floors of the British Museum, via Google Streetview, and get close-ups of 4,500 artefacts. No more trudging for miles through hordes of tourists, with nowhere to sit down except in the cafes…

Built over 15 months with the help of a Google employee with a camera on wheels [and] completed by the Google Cultural Institute after hours, with special light-bulbs being installed to ensure the lighting remained the same through the galleries. The results can now be used by members of the public, academics who wish to study objects in detail from home, or teachers, who are being encouraged to “bring their lessons to life” through the resources.”

Also very useful for visitors who are only ever going to get one pass at an in-person visit, and who want to learn the layout of the place first in order to maximise their time at the Museum.

britmus-start

britmus

britmus2