Very slow regular expression search

前端 未结 3 779
南旧
南旧 2020-12-16 17:58

I\'m not sure I completely understand what is going on with the following regular expression search:

>>> import re
>>> template = re.compil         


        
3条回答
  •  借酒劲吻你
    2020-12-16 18:34

    The slowness is caused by backtracking of the engine:

    (\w+)+\.
    

    Backtracking will naturally occur with this pattern if there's no . at the end of your string. The engine will first attempt to match as many \w as possible and backtracks when it finds out that more characters need to be matched before the end of your string.

    (a x 59) .
    (a x 58) .
    ...
    (a) .
    

    Finally it will fail to match. However, the second + in your pattern causes the engine to inspect (n-1)! possible paths, so:

    (a x 58) (a) .
    (a x 57) (a) (a) .
    (a x 57) (a x 2) .
    ...
    (a) (a) (a) (a) (a) (a) (a) ...
    

    Removing the + will prevent an abnormal amount of backtracking:

    (\w+)\.
    

    Some implementations will also support possessive quantifiers, which might be more ideal in this particular scenario:

    (\w++)\.
    

提交回复
热议问题