Binary grep on Linux?

后端未结

关注

 6  1881

Say I have generated the following binary file:

# generate file:
python -c \'import sys;[sys.stdout.write(chr(i)) for i in (0,0,0,0,2,4,6,8,0,1,3,0,5,20)]\'


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2020-12-13 19:19
              
            
            
                                                                       
Someone else appears to have been similarly frustrated and wrote their own tool to do it (or at least something similar): bgrep.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2020-12-13 19:25
              
            
            
                                                                       
This seems to work for me:

grep --only-matching --byte-offset --binary --text --perl-regexp "<\x-hex pattern>" <file>


Short form:

grep -obUaP "<\x-hex pattern>" <file>


Example:

grep -obUaP "\x01\x02" /bin/grep


Output (Cygwin binary):

153: <\x01\x02>
33210: <\x01\x02>
53453: <\x01\x02>


So you can grep this again to extract offsets. But don't forget to use binary mode again.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2020-12-13 19:27
              
            
            
                                                                       
One-Liner Input

Here’s the shorter one-liner version:

% perl -ln0e 'print tell' < inputfile


And here's a slightly longer one-liner:

% perl -e '($/,$\) = ("\0","\n"); print tell while <STDIN>' < inputfile


The way to connect those two one-liners is by uncompiling the first one’s program:

% perl -MO=Deparse,-p -ln0e 'print tell'
BEGIN { $/ = "\000"; $\ = "\n"; }
LINE: while (defined(($_ = <ARGV>))) {
    chomp($_);
    print(tell);
}


Programmed Input

If you want to put that in a file instead of a calling it from the command line, here’s a somewhat  more explicit version:

#!/usr/bin/env perl

use English qw[ -no_match_vars ];

$RS  = "\0";    # input  separator for readline, chomp
$ORS = "\n";    # output separator for print

while (<STDIN>) {
    print tell();
}


And here’s the really long version:

#!/usr/bin/env perl

use strict;
use autodie;  # for perl5.10 or better
use warnings qw[ FATAL all  ];

use IO::Handle;

IO::Handle->input_record_separator("\0");
IO::Handle->output_record_separator("\n");

binmode(STDIN);   # just in case

while (my $null_terminated = readline(STDIN)) {
    # this just *past* the null we just read:
    my $seek_offset = tell(STDIN);
    print STDOUT $seek_offset;  

}

close(STDIN);
close(STDOUT);


One-Liner Output

BTW, to create the test input file, I didn’t use your big, long   Python script; I just used this simple  Perl one-liner:

% perl -e 'print 0.0.0.0.2.4.6.8.0.1.3.0.5.20' > inputfile


You’ll find that Perl often winds up being 2-3 times shorter than Python to do the same job.  And you don’t have to compromise on clarity; what could be simpler that the one-liner above?

Programmed Output

I know, I know.  If you don’t already know the language, this might be clearer:

#!/usr/bin/env perl
@values = (
    0,  0,  0,  0,  2,
    4,  6,  8,  0,  1,
    3,  0,  5, 20,
);
print pack("C*", @values);


although this works, too:

print chr for @values;


as does 

print map { chr } @values;


Although for those who like everything all rigorous and careful and all, this might be more what you would see:

#!/usr/bin/env perl

use strict;
use warnings qw[ FATAL all ];
use autodie;

binmode(STDOUT);

my @octet_list = (
    0,  0,  0,  0,  2,
    4,  6,  8,  0,  1,
    3,  0,  5, 20,
);

my $binary = pack("C*", @octet_list);
print STDOUT $binary;

close(STDOUT); 


TMTOWTDI

Perl supports more than one way to do things so that you can pick the one that you’re most comfortable with.  If this were something I planned to check in as school or work project, I would certainly select the longer, more careful versions — or at least put a comment in the shell script if I were using the one-liners.

You can find documentation for Perl on your own system.  Just type

% man perl
% man perlrun
% man perlvar
% man perlfunc


etc at your shell prompt.  If you want pretty-ish versions on the web instead, get the manpages for perl, perlrun, perlvar, and perlfunc from http://perldoc.perl.org.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我在风中等你        
                
              
                            
                2020-12-13 19:29
              
            
            
                                                                       
What about grep -a?  Not sure how it works on truly binary files but it works well on text files that the OS thinks is binary.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  攒了一身酷        
                
              
                            
                2020-12-13 19:34
              
            
            
                                                                       
One way to solve your immediate problem using only grep is to create a file containing a single null byte.  After that, grep -abo -f null_byte_file target_file will produce the following output.

0:
1:
2:
3:
8:
11:


That is of course each byte offset as requested by "-b" followed by a null byte as requested by "-o"

I'd be the first to advocate perl, but in this case there's no need to bring in the extended family.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤街浪徒        
                
              
                            
                2020-12-13 19:38
              
            
            
                                                                       
The bbe program is a sed-like editor for binary files. See documentation.

Example with bbe:

bbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin

11:x00 x00 xcc x00 x00 x00 xcd x00 x00 x00 xce


Explanation

-b search pattern between //. each 2 byte begin with \x (hexa notation).
   -b works like this /pattern/:length (in byte) after matched pattern
-s similar to 'grep -o' suppress unmatched output 
-e similar to 'sed -e' give commands
-e 'F d' display offsets before each result here: '11:'
-e 'p h' print results in hexadecimal notation
-e 'A \n' append end-of-line to each result


You can also pipe it to sed to have a cleaner output:

bbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin | sed -e 's/x//g'

11:00 00 cc 00 00 00 cd 00 00 00 ce



  Your solution with Perl from your EDIT3 give me an 'Out of memory'
  error with large files.
  
  The same problem goes with bgrep.
  
  The only downside to bbe is that I don't know how to print context that precedes a matched pattern.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复