In Python Regular Expressions,
re.compile(\"x\"*50000)
gives me OverflowError: regular expression code size limit exceeded
The difference is that ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000
can be reduced to ".*?"
, while "x"*50000
has to generate 50000 nodes in the FSM (or a similar structure used by the regex engine).
EDIT: Ok, I was wrong. It's not that smart. The reason why "x"*50000
fails, but ".*?x"*50000
doesn't is that there is a limit on size of one "code item". "x"*50000
will generate one long item and ".*?x"*50000
will generate many small items. If you could split the string literal somehow without changing the meaning of the regex, it would work, but I can't think of a way to do that.