Why does this regex take a long time to execute?

后端 未结 1 586
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-11 05:46

I found out, that for example this line has a very very long execution time:

System.out.println(
        \".. .. .. .. .. .. .. .. ..  .. .. .. .. .. .. .. .         


        
相关标签:
1条回答
  • 2020-12-11 06:29

    When pattern x is made optional - using ? or * quantifiers (or {0,}) - engine has two paths to approach according to the nature of quantifier being used:

    • Consumes then backtracks for other patterns (case of greediness i.e. .*, .?)
    • First doesn't consume and looks immediately for other patterns (case of laziness .*?)

    Someone probably is not aware about regular expressions or doesn't care about performance and throws .* wherever he needs a match somewhere in string and engines are so fast in taking steps back and forth that nothing seems weird or slow unless a pattern can not be found.

    Time complexity starts at O(n) and continues with O(n^2b) where b is level of nesting quantifiers. So on failure number of steps an engine takes is HUGE.

    To avoid such situations someone needs to consider some guiding principles:

    • Specifying boundaries. If pattern should stop somewhere before digits do not do .*. Instead do \D*.

    • Use conditions. You can check if pattern / letter x exists before running a whole match using a lookahead ^(?=[^x]*x). This leads to an early failure.

    • Use possessive quantifiers or atomic groups (if available). These two avoid backtracks. Sometimes you do not need backtracks.

    • Do not do (.*)+ or similar patterns. Instead reconsider your requirements or at least use atomic groups (?>.*)+.

    Your own Regular Expression isn't an exception. It suffers from much greediness and optional matches and needs a time to be restudied.

    0 讨论(0)
提交回复
热议问题