Notepad++ deleting lines containing duplicate words

荒凉一梦 提交于 2021-02-07 03:34:40

问题


I have a .txt document which consists of one word followed up with a date in one line, and so on in each line.

How can Notepad++ recognize same words in different lines and delete duplicate lines?


回答1:


Assuming the dates can be different for the same occurrence of the same word and you want to keep the one that appears first in the file then this should work (make sure your file end with a new line for this):

  1. Go to the "Replace" dialog (you can do Ctrl+F and go to replace tab).
  2. In the "Search Mode" at the bottom select "Regular expression" (make sure ". matches newline" is not selected).
  3. In the "Find what:" field type (\s*\w+ )(.*\r\n)((.*\r\n)*)\1.*\r\n
  4. In the "Replace with:" field type \1\2\3
  5. Click "Replace" until there are no more occurrences ("Replace All" does not seem to work for this, and perhaps there exists a better regex for which it will work, but I have not found it).

I've tested this on the file:

testing330     05:09-24/08
whatever     10:55-25/08
testing     15:57-26/08
testing667     19:22-30/08
linux     00:29-31/08
testing330     00:29-31/08
windows     12:25-31/08

And the result was:

testing330     05:09-24/08
whatever     10:55-25/08
testing     15:57-26/08
testing667     19:22-30/08
linux     00:29-31/08
windows     12:25-31/08



回答2:


Not a direct answer to your question, but I found this article based on the title. I was looking to just delete duplicate lines. I found an easy way to do that here

  1. Mark all the text (CTRL+A). Click TextFX → Click TextFX Tools → Check +Sort outputs only UNIQUE (at column) lines (if not already checked).
  2. Click TextFX → Click TextFX Tools → Click Sort lines case insensitive (at column)



回答3:


You can use EditPlus on Windows OR TextWrangler on Mac to sort and remove duplicated lines easy.

After Notepad++ 6.5.2 (free) you can sort lines OR you can install the plugin "TextFX Characters" using the "Plugin Manager".

TextFX includes numerous features to transform selected text. Featuring: * Interactive Brace Matching * Quote handling * Character case alternation * Text rewrap * Column Lineup * Fill Text Down * Insert counter text down * Text to code conversion * Numeric Conversion * URI & HTML encoding * HTML to text conversion * Submit text to W3C * Text sorting * Ascii Chart * Leading whitespace repair * Autoclose HTML & braces Homepage: http://textfx.no-ip.com/textfx/




回答4:


For me personally, here are the steps I follow. Let's assume you have only 1 column of data in column A.

  1. Import the data into Excel.
  2. Sort the data.
  3. Insert a function to check for duplicates. Cell B2 would be: =IF(A2=A1,"Duplicate","")
  4. Select all of column B.
  5. Copy.
  6. Paste special and paste the values.
  7. Sort the data according to column B.
  8. Delete all the ones marked with "Duplicate".
  9. Copy the data back to Notepad++

I thought there was a plugin like this, but can't find it now. Otherwise, this link may help you.



来源:https://stackoverflow.com/questions/18768727/notepad-deleting-lines-containing-duplicate-words

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!