notepad++ check for duplicate lines complex

假如想象 提交于 2020-01-01 09:18:30

问题


Example

40000+lines with guids like this:

GUID: 0981723409871243

Search across all GUID's for duplicates

Example:

GUID: 124432408213
GUID: 08917234071423
GUID: 0189742381
GUID: 08917234071423
GUID: 0817423423
GUID: 124432408213

I have TextFX and Compare but how would I find this part there is 2 124432408213 and 2 08917234071423

out of 40,000 lines with possible duplicates I cant easily detect them I need a way to find duplicates.

It would be to be something like GUID: "Search text after guid" next line then continue search for each GUID...I could write a custom program that can do this but...trying to avoid having to do this TextFX is pretty powerful just don't see a way to do something like this...

I should add a little more info here example:

[block1] guid: ???? more info: ??? [/block1]

this is how each block is formatted..


回答1:


Use TextFx to sort the input lines and keep duplicates. Next do a regular expression search, setting Bookmark Line in the Mark tab. The search text should be ^(GUID:\s*\d+\r\n)\1 then click Mark all**. Next use Menu => Search => Bookmark => Remove unmarked lines to remove everything except the duplicates, or use Menu => Search => Bookmark => Copy Bookmarked Lines and paste the lines where wanted. If there are four or more identical lines then the above may finish with one entry for each pair, another TextFX sort removing duplicates should remove the surplus.

For the [block1] guid: ???? more info: ??? [/block1] case the regular expression is more complicated but ^(\[block1\] guid:\s*\d+ more info:\s*\d+ \[/block1\]\r\n)\1 finds and marks the duplicates in:

[block1] guid: 1234 more info: 5678 [/block1]
[block1] guid: 1235 more info: 5678 [/block1]
[block1] guid: 1235 more info: 5678 [/block1]
[block1] guid: 1236 more info: 5678 [/block1]
[block1] guid: 1236 more info: 5678 [/block1]

On Linux or similar a command such as sort -c inputFileName | grep -v "^\s*1\s" or sort inputFileName | unic -c | grep -v "^\s*1\s" or sort inputFileName | uniq -d should work depending on exactly which commands and options are available.




回答2:


Although my answer can't help you by now... Copy your lines into 2 news tabs, then use TextFX to duplicate sort tab 1 and unique sort tab 2. Then move tab 2 to other view, finally use Compare.



来源:https://stackoverflow.com/questions/16940950/notepad-check-for-duplicate-lines-complex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!