I\'m still relatively new to regex. I\'m trying to find the shortest string of text that matches a particular pattern, but am having trouble if the shortest pattern is a sub
A Python loop to look for the shortest match, by brute force testing each substring from left to right, picking the shortest:
shortest = None
for i in range(len(string)):
m = my_regex.match(string[i:])
if m:
mstr = m.group()
if shortest is None or len(mstr) < len(shortest):
shortest = mstr
print shortest
Another loop, this time letting re.findall do the hard work of searching for all possible matches, then brute force testing each match right-to-left looking for a shorter substring:
# find all matches using findall
matches = my_regex.findall(string)
# for each match, try to match right-hand substrings
shortest = None
for m in matches:
for i in range(-1,-len(m),-1):
mstr = m[i:]
if my_regex.match(mstr):
break
else:
mstr = m
if shortest is None or len(mstr) < len(shortest):
shortest = mstr
print shortest
Contrary to most other answers here, this can be done in a single regex using a positive lookahead assertion with a capturing group:
>>> my_pattern = '(?=(a.*?b.*?c))'
>>> my_regex = re.compile(my_pattern, re.DOTALL|re.IGNORECASE)
>>> matches = my_regex.findall(string)
>>> print min(matches, key=len)
A|B|C
findall()
will return all possible matches, so you need min()
to get the shortest one.
How this works:
No. Perl returns the longest, leftmost match, while obeying your non-greedy quantifiers. You'll have to loop, I'm afraid.
Edit: Yes, I realize I said Perl above, but I believe it is true for Python.