Catastrophic backtracking shouldn't be happening on this regex

后端 未结 3 489
抹茶落季
抹茶落季 2020-12-03 09:47

Can someone explain why Java\'s regex engine goes into catastrophic backtracking mode on this regex? Every alternation is mutually exclusive with every other alternation, fr

3条回答
  •  没有蜡笔的小新
    2020-12-03 10:00

    I have to admit this surprised me, too, but I get the same result in RegexBuddy: it quits trying after a million steps. I know the warnings about catastrophic backtracking tend to focus on nested quantifiers, but in my experience alternation is at least as dangerous. In fact, if I change the last part of your regex from this:

    '(?:[^']+|'')+'
    

    ...to this:

    '(?:[^']*(?:''[^']*)*)'
    

    ...it fails in only eleven steps. This is an example of Friedl's "unrolled loop" technique, which he breaks down like this:

    opening normal * ( special normal * ) * closing
       '     [^']        ''     [^']           '
    

    The nested stars are safe as long as:

    1. special and normal can never match the same thing,
    2. special always matches at least one character, and
    3. special is atomic (there must be only one way for it to match).

    The regex will then fail to match with minimal backtracking, and succeed with no backtracking at all. The alternation version, on the other hand, is almost guaranteed to backtrack, and where no match is possible it quickly spirals out of control as the length of the target string increases. If it doesn't backtrack excessively in some flavors, it's because they have optimizations built in specifically to counter this problem--something very few flavors do, so far.

提交回复
热议问题