Removing Duplicate lines with random text behind it

前端 未结 1 638
没有蜡笔的小新
没有蜡笔的小新 2020-12-22 02:01

I have text like this in Notepad++

Random Text Here:188.0.0.0
Random Text Here:188.0.3.0
Random Text Here:188.2.0.0

However, some of the nu

相关标签:
1条回答
  • 2020-12-22 02:35

    In Notepad++ I would try the following multi-step process.

    (1) Use a regular expression to change all lines to put the IP address and fixed text at the front from Random Text Here:188.0.0.0 to :188.0.0.0!!!Random Text Here.

    (2) Use TextFx to sort the file removing duplicates.

    (3) Use a regular expression to find and remove duplicate. This may need multiple passes.

    (4) Use a regular expression to put the text back in the right order.

    (5) (Optional) sort the file again.

    Problems with the above approach:

    (a) The "random text" that sorts first for an IP address will be the one that is kept, not the first in the original file.

    (b) The result will be ordered by IP address or by the random text depending on whether step (5) is used.

    In more detail:

    (0) Choose a character or a short string that does not occur in the input file. I will use !!.

    (1) Do a regular expression replace on the file (with dot does not match newline selected) to change ^(.*)(:\d+\.\d+\.\d+\.\d+)$ to $2!!$1.

    (2) Use TextFx to sort the file. Specifying sort unique may be useful to reduce the number of lines.

    (3) Do a regular expression replace on the file (with dot does not match newline selected) to change ^(:\d+\.\d+\.\d+\.\d+)!!(.*)\r\n\1.*$ to $1!!$2. When there are several lines with the same IP address this will remove about half of them. Run the same replacement several times until it reports no changes have been made. You may need to alter the \r\n part depending on the line endings in your file

    (4) Do a regular expression replace on the file (with dot does not match newline selected) to change ^(:\d+\.\d+\.\d+\.\d+)!!(.*)$ to $2$1.

    (5) (Optional) sort the file again.

    0 讨论(0)
提交回复
热议问题