How do I find the shortest overlapping match using regular expressions?

前端 未结 9 1111
無奈伤痛
無奈伤痛 2020-12-15 08:03

I\'m still relatively new to regex. I\'m trying to find the shortest string of text that matches a particular pattern, but am having trouble if the shortest pattern is a sub

9条回答
  •  借酒劲吻你
    2020-12-15 08:28

    Contrary to most other answers here, this can be done in a single regex using a positive lookahead assertion with a capturing group:

    >>> my_pattern = '(?=(a.*?b.*?c))'
    >>> my_regex = re.compile(my_pattern, re.DOTALL|re.IGNORECASE)
    >>> matches = my_regex.findall(string)
    >>> print min(matches, key=len)
    A|B|C
    

    findall() will return all possible matches, so you need min() to get the shortest one.

    How this works:

    • We're not matching any text in this regex, just positions in the string (which the regex engine steps through during a match attempt).
    • At each position, the regex engine looks ahead to see whether your regex would match at this position.
    • If so, it will be captured by the capturing group.
    • If not, it won't.
    • In either case, the regex engine then steps ahead one character and repeats the process until the end of the string.
    • Since the lookahead assertion doesn't consume any characters, all overlapping matches will be found.

提交回复
热议问题