Python: consecutive lines between matches similar to awk

后端未结
关注
 3  770
青春惊慌失措 2021-01-15 06:30
Given:

A multiline string string (already read from a file file)
Two patterns pattern1 and pattern2

      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   醉话见心
                                             
                
                
                (楼主)
            
              
              
                2021-01-15 07:04
              

            
            
                        
In awk the /start/, /end/ range regex prints the entire line that the /start/is found in up to and including the entire line where the /end/ pattern is found. It is a useful construct and has been copied by Perl, sed, Ruby and others. 

To do a range operator in Python, write a class that keeps track of the state of the previous call to the start operator until the end operator. We can use a regex (as awk does) or this can be trivially modified to anything returning a True or False status for a line of data. 

Given your example file, you can do:

import re

class FlipFlop: 
    ''' Class to imitate the bahavior of /start/, /end/ flip flop in awk '''
    def __init__(self, start_pattern, end_pattern):
        self.patterns = start_pattern, end_pattern
        self.state = False
    def __call__(self, st):
        ms=[e.search(st) for e in self.patterns]
        if all(m for m in ms):
            self.state = False
            return True
        rtr=True if self.state else False
        if ms[self.state]:
            self.state = not self.state
        return self.state or rtr

with open('/tmp/file') as f:
    ff=FlipFlop(re.compile('b bb'), re.compile('d dd'))
    print ''.join(line if ff(line) else "" for line in f)


Prints:

bbb bb b
ccc cc c
ffffd dd d


That retains a line-by-line file read with the flexibility of /start/,/end/ regex seen in other languages. Of course, you can do the same approach for a multiline string (assumed be named s): 

''.join(line+"\n" if ff(line) else "" for line in s.splitlines())


Idiomatically, in awk, you can get the same result as a flipflop using a flag:

$ awk '/b bb/{flag=1} flag{print $0} /d dd/{flag=0}' file


You can replicate that in Python as well (with more words):

flag=False    
with open('file') as f:
    for line in f:
        if re.search(r'b bb', line):
            flag=True
        if flag:
            print(line.rstrip())
        if re.search(r'd dd', line):
            flag=False  


Which can also be used with in memory string.       

Or, you can use a multi-line regex:

with open('/tmp/file') as f:
    print ''.join(re.findall(r'^.*b bb[\s\S]*d dd.*$', f.read(), re.M))


Demo and explanation   

But that requires reading the entire file into memory. Since you state the string has been read into memory, that is probably easiest in this case:

''.join(re.findall(r'^.*b bb[\s\S]*d dd.*$', s, re.M))

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复