How to combine multiple regex into single one in python?

前端 未结 3 799
长发绾君心
长发绾君心 2020-11-30 10:18

I\'m learning about regular expression. I don\'t know how to combine different regular expression to make a single generic regular expression.

I want to write a sing

相关标签:
3条回答
  • 2020-11-30 10:34

    To findall with an arbitrary series of REs all you have to do is concatenate the list of matches which each returns:

    re_list = [
        '\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*', # re1 in question,
        ...
        '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*', # re4 in question
    ]
    
    matches = []
    for r in re_list:
       matches += re.findall( r, string)
    

    For efficiency it would be better to use a list of compiled REs.

    Alternatively you could join the element RE strings using

    generic_re = re.compile( '|'.join( re_list) )
    
    0 讨论(0)
  • 2020-11-30 10:50

    I see lots of people are using pipes, but that seems to only match the first instance. If you want to match all, then try using lookaheads.

    Example:

    >>> fruit_string = "10a11p" 
    >>> fruit_regex = r'(?=.*?(?P<pears>\d+)p)(?=.*?(?P<apples>\d+)a)'
    >>> re.match(fruit_regex, fruit_string).groupdict()
    {'apples': '10', 'pears': '11'}
    >>> re.match(fruit_regex, fruit_string).group(0)
    '10a,11p'
    >>> re.match(fruit_regex, fruit_string).group(1)
    '11'
    

    (?= ...) is a look ahead:

    Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

    .*?(?P<pears>\d+)p find a number followed a p anywhere in the string and name the number "pears"

    0 讨论(0)
  • 2020-11-30 10:54

    You need to compile all your regex functions. Check this example:

    import re
    re1 = r'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*'
    re2 = '\d*[/]\d*[A-Z]*\d*\s[A-Z]*\d*[A-Z]*'
    re3 = '[A-Z]*\d+[/]\d+[A-Z]\d+'
    re4 = '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*'
    
    sentences = [string1, string2, string3, string4]
    for sentence in sentences:
        generic_re = re.compile("(%s|%s|%s|%s)" % (re1, re2, re3, re4)).findall(sentence)
    
    0 讨论(0)
提交回复
热议问题