Python regex pattern max length in re.compile?

前端 未结 1 998
别跟我提以往
别跟我提以往 2020-12-07 02:12

I try to compile a big pattern with re.compile in Python 3.

The pattern I try to compile is composed of 500 small words (I want to remove them from a te

相关标签:
1条回答
  • 2020-12-07 02:40

    Consider this example:

    import re
    stop_list = map(lambda s: "\\b" + str(s) + "\\b", range(1000, 2000))
    stopstring = "|".join(stop_list)
    stopword_pattern = re.compile(stopstring)
    

    If you try to print the pattern, you'll see something like

    >>> print(stopword_pattern)
    re.compile('\\b1000\\b|\\b1001\\b|\\b1002\\b|\\b1003\\b|\\b1004\\b|\\b1005\\b|\\b1006\\b|\\b1007\\b|\\b1008\\b|\\b1009\\b|\\b1010\\b|\\b1011\\b|\\b1012\\b|\\b1013\\b|\\b1014\\b|\\b1015\\b|\\b1016\\b|\\b1017\\b|\)
    

    which seems to indicate that the pattern is incomplete. However, this just seems to be a limitation of the __repr__ and/or __str__ methods for re.compile objects. If you try to perform a match against the "missing" part of the pattern, you'll see that it still succeeds:

    >>> stopword_pattern.match("1999")
    <_sre.SRE_Match object; span=(0,4), match='1999')
    
    0 讨论(0)
提交回复
热议问题