问题
I am aiming to identify and keep DUPLICATE, TRIPLICATE, etc. lines, i.e., all lines that occur more than once in Notepad++? In other words, how can I delete all unique lines only?
For example, here are seven (7) separate lists and the desired true duplicate lines of each lists (shown as 7 columns, regard each column as an individual list or file!). (The lists here are shown side by side only to save space, in real life, each of the 7 lists occurs alone and independently from the others and are separate files!)
list1 list2 list3 list4 list5 list6 list7
1 0 0 0 0 0 0
2 1 1 1 1 1 1
3 2 2 2 2 2 2
4 3 3 3 3 3 3
4 4 4 4 4 4 4
4 4 4 4 4 4 4
5 4 4 4 4 4 4
6 5 5 5 5 5 5
7 5 5 5 5 5 5
8 6 6 6 6 6 6
9 6 6 6 6 6 6
abc 7 7 7 7 7 7
abd 8 8 8 8 8 8
abd 9 9 9 9 9 9
abe <CR> 9 9 9 9
<CR> 99 99
<CR>
[Lines of multiple occurence of above lists:]
4 4 4 4 4 4 4
4 4 4 4 4 4 4
4 4 4 4 4 4 4
abd 5 5 5 5 5 5
abd 5 5 5 5 5 5
6 6 6 6 6 6
6 6 6 6 6 6
9 9 9 9
9 9 9 9
There are many solutions to eliminate duplicates (e.g., TextFX; notepad++ delete duplicate and original lines to keep unique lines), I can not find solutions to keep duplicates only.
((.*)\R(\2\R)+)*\K.+\R
@Lars Fischer: This script works nearly OK, except the last entry of the (presorted) list needs to be unique line followed by a <CR>
empty line. One (suboptimal) workaround is to insert an artificial (helper) unique line (e.g., zzz) followed by an empty line <CR>
as the last two lines.
(END OF QUESTION)
UPDATE 3: This question is reposted per stackoverflow "ask a new question" instruction. (@AdrianHHH, @B. Desai, @Paolo Forgia, @greg-449, @Erik von Asmuth draw the incorrect conclusion that this question is a duplicate of notepad++ delete duplicate and original lines to keep unique lines. This question is definitely not a duplicate of the one @AdrianHHH et al quotes.
UPDATE 2: @AdrianHHH This question is not less "broad" (in fact, one can hardly be more specific) or less researched than other Notepad++ questions, including the one https://stackoverflow.com/questions/29303148 cited (wrongly) by @AdrianHHH et al. as the same question.
UPDATE: @AdrianHHH, @B. Desai, @Paolo Forgia, @greg-449, @Erik von Asmuth This questions is different from: https://stackoverflow.com/questions/29303148 beacuse Q 29303148 is (i) neither asking how to identify and keep only the lines of multiple occurrence, (ii) neither there is a solution provided in the answers for that. Q 29303148 asks "...I just need the unique lines."
回答1:
Here is a solution based on regular Expressions and bookmarks, it works for a sorted file (i.e. each duplicated line is followed by its duplicates):
- Open the Mark Dialog (Search -> Mark ....)
- click Clear all Marks on the right
- check Bookmark line
- check Wrap aound
- Find What:
((.*)\R(\2\R?)+)*\K.*
- Check regular expression and uncheck
. matches newline
- Mark All
- Click Close
- Search -> Bookmark -> Remove Bookmarked Lines
Explanation
The regular expression is made up of three parts:
((.*)\R(\2\R?)+)*
: this is an optional block of duplicates consisting of one ore more line blocks- the outher
( ... )*
matches zero or more such blocks of duplicated lines (if in your example the three 4 would be followed by two 5 we will need a concept of sequences of duplicate blocks) (.*)\R(\2\R?)+
:\2
references the content of(.*)
: this are all duplicates of one line- the second
\R
is an optional ( due to the?
) linebreak. Thus it is possible to match a duplicate in the last line of the file if that line does not end with a linebreak
If there is a block of duplicated lines after the cursor position from which you start, this will match it.
- the outher
now
\K
discards what we have matched so far (the duplicates) and "puts the cursor" before the first unique line.*
matches the next (unique) line and bookmarks it
Using Mark All we bookmark all such unique lines, so that we can remove them using the Entry from the Search -> Bookmark menu.
来源:https://stackoverflow.com/questions/46727299/find-and-keep-all-duplicate-lines-instead-of-unique-lines-in-a-text-file