• Directory
  • FAQ: about JURN
  • Group tests
  • Guide to academic search
  • JURN’s donationware
  • Links
  • openEco: titles indexed

News from JURN

~ search tool for open access content

News from JURN

Monthly Archives: December 2022

Flickr Foundation

18 Sunday Dec 2022

Posted by futurilla in Spotted in the news

≈ Leave a comment

A new Flickr Foundation, set to start hiring in early 2023. Among the current aims: “restore and then grow the Flickr Commons”; provide evidence about use and usefulness; and bring in new curators. If only they hadn’t locked me out of my Flickr account years ago (due to the Yahoo crash-and-burn), I’d be among them.

“First, catch your cat…”

18 Sunday Dec 2022

Posted by futurilla in Spotted in the news

≈ Leave a comment

There’s a new Democracy’s Library at Archive.org, a unified hub bringing together… “more than 700 collections from over 50 government organizations, archived by the Internet Archive since 2006.” And they’re collecting more from governments around the world. The collection has a search box, constrained to the collection. As you might expect many documents are a little dated though many are still practical, such as How to Catch a Cat.

How to remove an erroneously added Excel hyprlink

11 Sunday Dec 2022

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

How to remove an erroneously added hyperlink, from just one cell in Excel 2007. Problem: Sometimes you paste and there’s a hyperlink formed, sometimes not. It seems a bit arbitrary, regardless of what you have set in your Autoformat settings. Once a live hyperink appears in a cell, there is no right-click | “Remove Hyperlink” in Excel 2007. Only the ability to add a hyperlink. No “remove hyperlink” on right-click.

Solution in Excel 2007:

1. Select just the hyperlinked cell.

2. Top bar | Home | go along to the far end of the bar, where the “Sort and Find” is. Next to this is “Clear” and there you select “Clear Formats”. That should do it. You now have plain text in your cell, and it’s no longer a live hyperlink. Later versions of MS Office Excel also added a “Clear Hyperlinks” option here, at the foot of the “Clear” selection options. But here we’re assuming you’re stuck with good olde 2007.

3. Save.

How to archive a recalcitrant forum

11 Sunday Dec 2022

Posted by futurilla in JURN tips and tricks

≈ Leave a comment

Task: To download and safely archive a useful but very recalcitrant user-forum, one that may be at risk of going offline.

Roadblocks:

1) The forum archives can only be accessed by drop-downs that require you to input precise from-to dates (see above). Harvesters / bots cannot get past such barriers, and cannot reach the forum’s ‘deep history’ of per-post threads.

2) Even if you had the individual URL of each and every forum thread, only a proper Web browser can get and archive each forum thread URL. Automated harvesters / bots / capture utilities are quickly blocked by the forum’s server.

3) AutoIT or the newer AutoHotKey might be a solution on Windows, by calling Internet Explorer to load the URLs and then save each as a file. But my intensive searches find only arcane code fragments, and one code function. Nothing complete or even part-way complete.

The following solution thus requires a bit of manual work, though not too much. It is for a relatively small forum or sub-forum, of technical coding advice (in this case Python for 3D software) without a great weight of images being posted. In this case there are 16 master pages of links to some 500 actual forum posts, and each post has user replies appended. Each post displays as a single scrolling page and is not paginated.

Solution:

1. Find the earliest forum thread date, then manually go through and create a per-year page that show the links to the forum threads. Save it, and also any continuation pages there may be for that year. Work through the years and, on a long-standing forum or sub-forum, you may perhaps end up with some 15-20 saved HTML pages. It should not take more than a few minutes.

2. Extract a big list of all the links in these locally saved HTML pages. I used Sobolsoft’s ‘Extract Links from Multiple HTML Pages’ Windows utility to do this, but there are other bulk link extractors.

3. Save the extracted one-per-line links list to a .TXT file, copy-paste that list to Excel and sort the list A-Z. From this sorted list you extract just the links that point to the forum threads. They should have a uniform path and pattern, allowing them to be easily identified and extracted. Save the new list to a further .TXT file.

4. Use the free Chrome-based Web browser extension DownThemAll! to load the new list .TXT (Web browser | start DownThemAll! | right-click anywhere | ‘Import from file’). You may also want to set DownThemAll! to only download one forum thread at a time (Web browser | start DownThemAll! | Cog icon in DownThemAll!’s lower right | Network | Concurrent downloads: 1).

Have DownThemAll! do the downloads. Very regrettably there is no way to have DownThemAll! save the pages from the browser to .MHT (.MHTML) or .PDF files. Just the same format as the target URLs point to.

5. Because you’re using your normal Web browser and only downloading one page/post at a time, use of DownThemAll! should not trigger any traffic blocking from the targeted forum.

Great, so you have the forum threads downloaded as .HTML files. Of course, there’s a problem. The .HTML pages being saved locally are not also saving the images. When you load one of these HTML forum pages locally, the Web browser is still loading the post’s images from the online forum server. That’s good, but we need a more permanent local file being saved.

6. The only solution I found for the next bit is the Pale Moon browser (very worthy, based on Firefox) and its free MozArchiver add-on. This add-on appears to be unique, in terms of being happy to save all open tabs (rather than just one). It saves each open tab as a portable .MHT file with embedded images. You will have to be brave though, and load 50-80 tabs at a time by drag-dropping the .html files onto Pale Moon. With my RAM and workstation, I find Pale Moon has no problem with 80 at a time. After drag-drop, pause to let the tabs all load. Then “save all tabs” to .MHTML files, which is quickly done.

It’s thus relatively easy to use this method to work through 500 or so locally-saved forums post-pages, provided they were not too image-heavy.

Then when done with each batch in Pale Moon, right-click on the left-most tab and “close all tabs to the right”. Repeat until finished.

That’s it. A slightly tedious workflow, but your recalcitrant and harvester-phobic user forum is now safely archived as portable .MHT files, one per forum thread. Good local indexing/search software (DocFetcher, DTSearch etc) should have no problem indexing local .MHT files, ready for you to do keyword searches across the local archive.

If you ever need to convert the .MHT (.MHTML) files back, the Windows freeware MHTML Converter 1.1 will do that and has batch processing.

RSS Feed: Subscribe

 

Please become my patron at www.patreon.com/davehaden to help JURN survive and thrive.

JURN

  • JURN : directory of ejournals
  • JURN : main search-engine
  • JURN : openEco directory
  • JURN : repository search
  • Categories

    • Academic search
    • Ecology additions
    • Economics of Open Access
    • How to improve academic search
    • JURN blogged
    • JURN metrics
    • JURN tips and tricks
    • JURN's Google watch
    • My general observations
    • New media journal articles
    • New titles added to JURN
    • Official and think-tank reports
    • Ooops!
    • Open Access publishing
    • Spotted in the news
    • Uncategorized

    Archives

    • May 2025
    • April 2025
    • December 2024
    • September 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • June 2023
    • May 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009

    Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.