Python's Regular Expression Source String Length

前端 未结 2 2034
生来不讨喜
生来不讨喜 2020-12-04 02:25

In Python Regular Expressions,

re.compile(\"x\"*50000)

gives me OverflowError: regular expression code size limit exceeded

相关标签:
2条回答
  • 2020-12-04 02:59

    you want to match 50000 "x"s , correct??? if so, an alternative without regex

    if "x"*50000 in mystring:
        print "found"
    

    if you want to match 50000 "x"s using regex, you can use range

    >>> pat=re.compile("x{50000}")
    >>> pat.search(s)
    <_sre.SRE_Match object at 0xb8057a30>
    

    on my system it will take in length of 65535 max

    >>> pat=re.compile("x{65536}")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.6/re.py", line 188, in compile
        return _compile(pattern, flags)
      File "/usr/lib/python2.6/re.py", line 241, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python2.6/sre_compile.py", line 529, in compile
        groupindex, indexgroup
    RuntimeError: invalid SRE code
    >>> pat=re.compile("x{65535}")
    >>>
    

    I don't know if there are tweaks in Python we can use to increase that limit though.

    0 讨论(0)
  • 2020-12-04 03:17

    The difference is that ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000 can be reduced to ".*?", while "x"*50000 has to generate 50000 nodes in the FSM (or a similar structure used by the regex engine).

    EDIT: Ok, I was wrong. It's not that smart. The reason why "x"*50000 fails, but ".*?x"*50000 doesn't is that there is a limit on size of one "code item". "x"*50000 will generate one long item and ".*?x"*50000 will generate many small items. If you could split the string literal somehow without changing the meaning of the regex, it would work, but I can't think of a way to do that.

    0 讨论(0)
提交回复
热议问题