Python regex: Alternation for sets of words

前端 未结 6 1039
我在风中等你
我在风中等你 2020-12-19 17:41

We know \\ba\\b|\\bthe\\b will match either word \"a\" or \"the\"
I want to build a regex expression to match a pattern li

6条回答
  •  無奈伤痛
    2020-12-19 18:38

    An interesting feature of the regex module is the named list. With it, you don't have to include several alternatives separated by | in a non capturing group. You only need to define the list before and to refer to it in the pattern by its name. Example:

    import regex
    
    words = [ ['a', 'the', 'one'], ['reason', 'reasons'], ['for', 'of'] ]
    
    pattern = r'\m \L \s+ \L \s+ \L \M'
    p = regex.compile(pattern, regex.X, word1=words[0], word2=words[1], word3=words[2])
    
    s = 'the reasons for'
    
    print(p.search(s))
    

    Even if this feature isn't essential, It improves the readability.

    You can achieve something similar with the re module if you join items with | before:

    import re
    
    words = [ ['a', 'the', 'one'], ['reason', 'reasons'], ['for', 'of'] ]
    
    words = ['|'.join(x) for x in words]
    
    pattern = r'\b ({}) \s+ ({}) \s+ ({}) \b'.format(*words)
    
    p = re.compile(pattern, re.X)
    

提交回复
热议问题