问题
I have rows in a list that are sometimes similar up to the first "space" character, then can change (i.e. a date afterwards).
wsmith jul/12/12
bwillis jul/13/13
wsmith jul/14/12
tcruise jul/12/12
I can easily sort the lines, but I'd love to remove the duplicate later dated entry. I did find a regex suggestion, but it matches only exactly the same lines. I need to be able to mark the entire row of similar usernames in the file. In my example above, lines 1 and 3 would be highlighted.
(edited for clarity)
回答1:
A compact formula in the PCRE
engine (used by Notepad++) to see if there is repetition from one row to another would be
(?m)^(\S+).*\R(?s).*?\K\1
This will work in N++.

As you remove duplicate lines, more may become marked, because initially the regex skips over the in-between lines in order to highlight the duplicate.
Explanation
(?m)
turns on multi-line mode, allowing^
and$
to match on each line- The
^
anchor asserts that we are at the beginning of the string (\S+)
captures non-space chars to Group 1.*
gets to the end of the line\R
line break(?s)
activatesDOTALL
mode, allowing the dot to match across lines.*?
lazily match chars up to ...- The
\K
tells the engine to drop what was matched so far from the final match it returns \1
back-reference: match what Group 1 captured before.
回答2:
I propose this regex:
^(\S+) (?=(?s:.)*\1.*).*
It will mark the first users that have a duplicate.
regex101 demo
^ # Beginning of line
(\S+) # Match and store non-spaces
# One space
(?= # Positive look-ahead begin
(?s:.)* # Match any character including newlines
\1.* # Match the matched group (i.e. the username) and anything following on same line
) # End lookahead
.* # Match anything remaining on line (mainly for the first match)
If notepad++ marked all capture groups, you would have been able to use this to highlight all duplicates including the last one:
^(\S+) (?=(?s:.)*(\1.*)).*
regex101 demo
But unfortunately (at least for v6.5.2), N++ doesn't mark the capture groups.
来源:https://stackoverflow.com/questions/24947409/match-partially-duplicated-lines