Very slow regular expression search

前端未结

关注

 3  779

南旧 2020-12-16 17:58

I\'m not sure I completely understand what is going on with the following regular expression search:

>>> import re
>>> template = re.compil


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   借酒劲吻你
                                             
                
                
                (楼主)
            
              
              
                2020-12-16 18:34
              

            
            
                        
The slowness is caused by backtracking of the engine:

(\w+)+\.


Backtracking will naturally occur with this pattern if there's no . at the end of your string. The engine will first attempt to match as many \w as possible and backtracks when it finds out that more characters need to be matched before the end of your string.

(a x 59) .
(a x 58) .
...
(a) .


Finally it will fail to match. However, the second + in your pattern causes the engine to inspect (n-1)! possible paths, so:

(a x 58) (a) .
(a x 57) (a) (a) .
(a x 57) (a x 2) .
...
(a) (a) (a) (a) (a) (a) (a) ...


Removing the + will prevent an abnormal amount of backtracking:

(\w+)\.


Some implementations will also support possessive quantifiers, which might be more ideal in this particular scenario:

(\w++)\.

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复