How do I find the shortest overlapping match using regular expressions?

前端 未结 9 1112
無奈伤痛
無奈伤痛 2020-12-15 08:03

I\'m still relatively new to regex. I\'m trying to find the shortest string of text that matches a particular pattern, but am having trouble if the shortest pattern is a sub

9条回答
  •  情歌与酒
    2020-12-15 08:12

    This might be a useful application of sexegers. Regular-expression matching is biased toward the longest, leftmost choice. Using non-greedy quantifiers such as in .*? skirts the longest part, and reversing both the input and pattern can get around leftmost-matching semantics.

    Consider the following program that outputs A|B|C as desired:

    #! /usr/bin/env python
    
    import re
    
    string = "A|B|A|B|C|D|E|F|G"
    my_pattern = 'c.*?b.*?a'
    
    my_regex = re.compile(my_pattern, re.DOTALL|re.IGNORECASE)
    matches = my_regex.findall(string[::-1])
    
    for match in matches:
        print match[::-1]
    

    Another way is to make a stricter pattern. Say you don't want to allow repetitions of characters already seen:

    my_pattern = 'a[^a]*?b[^ab]*?c'
    

    Your example is generic and contrived, but if we had a better idea of the inputs you're working with, we could offer better, more helpful suggestions.

提交回复
热议问题