How to remove every nth line in a list. The list is made up of repeating four-line blocks of text.
The situation:
You have a long and mostly cleaned text list that looks like this…
Random article title
Random author name
Gibbery wibble
Random journal title
Random article title
Random author name
Gibbery wibble
Random journal title
Random article title
Random author name
Gibbery wibble
Random journal title
… and you of course wish to delete all the unwanted Gibbery wibble lines. All the Gibbery wibble text is different. Indeed, there’s no keyword or repeating element in each four-line data block for a search-replace operation to the grab onto. The only repeating element is the blank line that separates each data block of four lines.
So far as I can see, after very extensive searching, there’s as yet no way to deal with this in Notepad++, even with plugins and Python scripts.
The slower solution:
The more flexible but longer solution is Excel. However, the latest version of Notepad++ (not the older, 32-bit version) will let you quickly take the first and vital step. You first delete the blank lines with…
Edit | Line Operations | Remove Empty Line
It’s far easier to delete blanks in a long list in Notepad++, rather than wrestling with complex ten-step workflows in Excel, just to do such a simple thing.
Then you copy-paste the list into a new Excel sheet. You then add these two Excel macros and run the first. Both run fine in Excel 2007. The first splits the column into chunks of 4 (if you have three lines per block, change all the 4s in the macro to 3s, if six lines then change to 6s, and so on). Each chunk is placed into a new column on the same sheet.
You can then delete the offending Gibbery wibble row, which will run uniformly across the spreadsheet. In this example, it all runs across row 3.
The second macro is then run and this recombines all the columns back into a long list, and places the recombined list onto a new sheet.
The free ASAP Utilities for Excel can then ‘chunk’ this list back into blocks of four, enabling you to add a blank line between each block. Optionally, you can add the HTML tag for a horizontal rule.
The same core method can be used to re-sort the lines each block, or to add numbering to each line as: | 1. 2. 3. 4. | 1. 2. 3. 4. | These operations are something Notepad++ can’t yet do.
The quicker solution:
If you need a quicker option, and don’t need to re-sort the lines in each data block in Excel, then try the Windows freeware List Numberer. As you can see below, once you’ve used this utility to run a simple operation, then a regex search back in Notepad++ (.*line3.* — used in Replace | Replace All) will clear all the unwanted lines.
Pingback: Tutorial: simple web-scraping with freeware | News from JURN