How to find patterns across multiple lines using grep?

后端未结

关注

 26  1526

I want to find files that have \"abc\" AND \"efg\" in that order, and those two strings are on different lines in that file. Eg: a file with content:

blah bl


                      
              相关标签:


      
      
        
          26条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2020-11-22 04:21
              
            
            
                                                                       
Sadly, you can't.  From the grep docs:


  grep  searches  the  named  input  FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2020-11-22 04:22
              
            
            
                                                                       
Grep is not sufficient for this operation.

pcregrep which is found in most of the modern Linux systems can be used as

pcregrep -M  'abc.*(\n|.)*efg' test.txt


where -M, --multiline  allow patterns to match more than one line

There is a newer pcre2grep also. Both are provided by the PCRE project.

pcre2grep is available for Mac OS X via Mac Ports as part of port pcre2:

% sudo port install pcre2 


and via Homebrew as:

% brew install pcre


or for pcre2

% brew install pcre2


pcre2grep is also available on Linux (Ubuntu 18.04+)

$ sudo apt install pcre2-utils # PCRE2
$ sudo apt install pcregrep    # Older PCRE

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北荒        
                
              
                            
                2020-11-22 04:22
              
            
            
                                                                       
With silver searcher:

ag 'abc.*(\n|.)*efg'


similar to ring bearer's answer, but with ag instead. Speed advantages of silver searcher could possibly shine here.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2020-11-22 04:24
              
            
            
                                                                       
I used this to extract a fasta sequence from a multi fasta file using the -P option for grep:

grep -Pzo ">tig00000034[^>]+"  file.fasta > desired_sequence.fasta



P for perl based searches
z for making a line end in 0 bytes rather than newline char
o to just capture what matched since grep returns the whole line (which in this case since you did -z is the whole file).


The core of the regexp is the [^>] which translates to "not greater than symbol"
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2020-11-22 04:27
              
            
            
                                                                       
While the sed option is the simplest and easiest, LJ's one-liner is sadly not the most portable.  Those stuck with a version of the C Shell will need to escape their bangs:

sed -e '/abc/,/efg/\!d' [file]


This unfortunately does not work in bash et al.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-11-22 04:27
              
            
            
                                                                       
The filepattern *.sh is important to prevent directories to be inspected. Of course some test could prevent that too.

for f in *.sh
do
  a=$( grep -n -m1 abc $f )
  test -n "${a}" && z=$( grep -n efg $f | tail -n 1) || continue 
  (( ((${z/:*/}-${a/:*/})) > 0 )) && echo $f
done


The

grep -n -m1 abc $f 


searches maximum 1 matching and returns (-n) the linenumber. 
If a match was found (test -n ...) find the last match of efg (find all and take the last with tail -n 1).

z=$( grep -n efg $f | tail -n 1)


else continue.

Since the result is something like 18:foofile.sh String alf="abc"; we need to cut away from ":" till end of line.

((${z/:*/}-${a/:*/}))


Should return a positive result if the last match of the 2nd expression is past the first match of the first. 

Then we report the filename echo $f.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
4
5
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复