Regular expression: Match everything after a particular word

后端未结

关注

 4  1913

I am using Python and would like to match all the words after test till a period (full-stop) or space is encountered.

text = \"test : match this


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2020-12-03 06:44
              
            
            
                                                                       
I don't see why you want to use regex if you're just getting a subset from a string.

This works the same way:

if line.startswith('test:'):
    print(line[5:line.find('.')])


example:

>>> line = "test: match this."
>>> print(line[5:line.find('.')])
 match this


Regex is slow, it is awkward to design, and difficult to debug. There are definitely occassions to use it, but if you just want to extract the text between test: and ., then I don't think is one of those occasions.

See: https://softwareengineering.stackexchange.com/questions/113237/when-you-should-not-use-regular-expressions

For more flexibility (for example if you are looping through a list of strings you want to find at the beginning of a string and then index out) replace 5 (the length of 'test:') in the index with len(str_you_looked_for).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-12-03 07:02
              
            
            
                                                                       
In a general case, as the title mentions, you may capture with (.*) pattern any 0 or more chars other than newline after any pattern(s) you want:

import re
p = re.compile(r'test\s*:\s*(.*)')
s = "test : match this."
m = p.search(s)           # Run a regex search anywhere inside a string
if m:                     # If there is a match
    print(m.group(1))     # Print Group 1 value


If you want . to match across multiple lines, compile the regex with re.DOTALL or re.S flag (or add (?s) before the pattern):

p = re.compile(r'test\s*:\s*(.*)', re.DOTALL)
p = re.compile(r'(?s)test\s*:\s*(.*)')


However, it will retrun match this.. See also a regex demo.

You can add \. pattern after (.*) to make the regex engine stop before the last . on that line:

test\s*:\s*(.*)\.


Watch out for re.match() since it will only look for a match at the beginning of the string (Avinash aleady pointed that out, but it is a very important note!)

See the regex demo and a sample Python code snippet:

import re
p = re.compile(r'test\s*:\s*(.*)\.')
s = "test : match this."
m = p.search(s)           # Run a regex search anywhere inside a string
if m:                     # If there is a match
    print(m.group(1))     # Print Group 1 value


If you want to make sure test is matched as a whole word, add \b before it (do not remove the r prefix from the string literal, or '\b' will match a BACKSPACE char!) - r'\btest\s*:\s*(.*)\.'.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遇见更好的自我        
                
              
                            
                2020-12-03 07:04
              
            
            
                                                                       
You need to use re.search since re.match tries to match from the beging of the string. To match until a space or period is encountered.

re.search(r'(?<=test :)[^.\s]*',text)


To match all the chars until  a period is encountered,

re.search(r'(?<=test :)[^.]*',text)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2020-12-03 07:04
              
            
            
                                                                       
Everything after test, including test

test.*


Everything after test, without test

(?<=test).*


Example here on regexr.com
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复