Python's Regular Expression Source String Length

前端 未结 2 2033
生来不讨喜
生来不讨喜 2020-12-04 02:25

In Python Regular Expressions,

re.compile(\"x\"*50000)

gives me OverflowError: regular expression code size limit exceeded

2条回答
  •  失恋的感觉
    2020-12-04 03:17

    The difference is that ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000 can be reduced to ".*?", while "x"*50000 has to generate 50000 nodes in the FSM (or a similar structure used by the regex engine).

    EDIT: Ok, I was wrong. It's not that smart. The reason why "x"*50000 fails, but ".*?x"*50000 doesn't is that there is a limit on size of one "code item". "x"*50000 will generate one long item and ".*?x"*50000 will generate many small items. If you could split the string literal somehow without changing the meaning of the regex, it would work, but I can't think of a way to do that.

提交回复
热议问题