问题
I want to match the longest sequence that is repeating at least once
Having:
T_send_ack-new_amend_pending-cancel-replace_replaced_cancel_pending-cancel-replace_replaced
the result should be: pending-cancel-replace_replaced
回答1:
Try this
(.+)(?=.*\1)
See it here on Regexr
This will match any character sequence with at least one character, that is repeated later on in the string.
You would need to store your matches and decide which one is the longest afterwards.
This solution requires your regex flavour to support backreferences and lookaheads.
it will match any character sequence with at least one character .+
and store it in the group 1 because of the brackets around it. The next step is the positive lookahead (?=.*\1)
, it will be true if the captured sequence occurs at a later point again in the string.
回答2:
Here a perl script that does the job:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $s = q/T_send_ack-new_amend_pending-cancel-replace_replaced_cancel_pending-cancel-replace_replaced/;
my $max = 0;
my $seq = '';
while($s =~ /(.+)(?=.*\1)/g) {
if(length$1 > $max) {
$max = length $1;
$seq = $1;
}
}
say "longuest sequence : $seq, length = $max"
output:
longuest sequence : _pending-cancel-replace_replaced, length = 32
回答3:
I have to admit that this one got me thinking. It was obvious that positive lookahead is absolutely necessary to solve this with regex. Anyhow here is how it would work in Java:
public static String biggestOccurance(String input){
Pattern p = Pattern.compile("(.+)(?=.*\\1)");
Matcher m = p.matcher(input);
String longestOccurence = "";
while(m.find()){
if(longestOccurence.length() < m.group(1).length()) longestOccurence = m.group(1);
}
return longestOccurence;
}
The thing that got me stuck was the
\\1
I knew that you could refer to a backreference in Java with
$1
but if you replace $1 with \\1 it will not work.
Will have to dig into that.
Cheers,Eugene.
回答4:
Using Perl you can do:
s='T_send_ack-new_amend_pending-cancel-replace_replaced_cancel_pending-cancel-replace_replaced'
echo $s | perl -pe 's/([^\s]+)(?=.*?\1)/\1\n/g'
Which gives:
T_
send_
ac
k-
n
e
w_
a
mend
_pending-cancel-replace_replaced
_
cancel
_
p
e
n
d
in
g-
c
a
nce
l
-replace
_re
placed
Then you need to post process it in any language or script to get longest text.
One Possible Post Processing of repeated string can be using awk:
echo $s | perl -pe 's/([^\s]+)(?=.*?\1)/\1\n/g' | awk '{ if (length($0) > max) {max = length($0); maxline = $0} } END { print maxline }'
Which prints:
_pending-cancel-replace_replaced
PS: Note longest string here is _pending-cancel-replace_replaced
来源:https://stackoverflow.com/questions/9177647/regular-epxressions-that-matches-the-longest-repeating-sequence