Binary grep on Linux?

后端 未结 6 1881
你的背包
你的背包 2020-12-13 18:53

Say I have generated the following binary file:

# generate file:
python -c \'import sys;[sys.stdout.write(chr(i)) for i in (0,0,0,0,2,4,6,8,0,1,3,0,5,20)]\'          


        
相关标签:
6条回答
  • 2020-12-13 19:19

    Someone else appears to have been similarly frustrated and wrote their own tool to do it (or at least something similar): bgrep.

    0 讨论(0)
  • 2020-12-13 19:25

    This seems to work for me:

    grep --only-matching --byte-offset --binary --text --perl-regexp "<\x-hex pattern>" <file>
    

    Short form:

    grep -obUaP "<\x-hex pattern>" <file>
    

    Example:

    grep -obUaP "\x01\x02" /bin/grep
    

    Output (Cygwin binary):

    153: <\x01\x02>
    33210: <\x01\x02>
    53453: <\x01\x02>
    

    So you can grep this again to extract offsets. But don't forget to use binary mode again.

    0 讨论(0)
  • 2020-12-13 19:27

    One-Liner Input

    Here’s the shorter one-liner version:

    % perl -ln0e 'print tell' < inputfile
    

    And here's a slightly longer one-liner:

    % perl -e '($/,$\) = ("\0","\n"); print tell while <STDIN>' < inputfile
    

    The way to connect those two one-liners is by uncompiling the first one’s program:

    % perl -MO=Deparse,-p -ln0e 'print tell'
    BEGIN { $/ = "\000"; $\ = "\n"; }
    LINE: while (defined(($_ = <ARGV>))) {
        chomp($_);
        print(tell);
    }
    

    Programmed Input

    If you want to put that in a file instead of a calling it from the command line, here’s a somewhat more explicit version:

    #!/usr/bin/env perl
    
    use English qw[ -no_match_vars ];
    
    $RS  = "\0";    # input  separator for readline, chomp
    $ORS = "\n";    # output separator for print
    
    while (<STDIN>) {
        print tell();
    }
    

    And here’s the really long version:

    #!/usr/bin/env perl
    
    use strict;
    use autodie;  # for perl5.10 or better
    use warnings qw[ FATAL all  ];
    
    use IO::Handle;
    
    IO::Handle->input_record_separator("\0");
    IO::Handle->output_record_separator("\n");
    
    binmode(STDIN);   # just in case
    
    while (my $null_terminated = readline(STDIN)) {
        # this just *past* the null we just read:
        my $seek_offset = tell(STDIN);
        print STDOUT $seek_offset;  
    
    }
    
    close(STDIN);
    close(STDOUT);
    

    One-Liner Output

    BTW, to create the test input file, I didn’t use your big, long Python script; I just used this simple Perl one-liner:

    % perl -e 'print 0.0.0.0.2.4.6.8.0.1.3.0.5.20' > inputfile
    

    You’ll find that Perl often winds up being 2-3 times shorter than Python to do the same job. And you don’t have to compromise on clarity; what could be simpler that the one-liner above?

    Programmed Output

    I know, I know. If you don’t already know the language, this might be clearer:

    #!/usr/bin/env perl
    @values = (
        0,  0,  0,  0,  2,
        4,  6,  8,  0,  1,
        3,  0,  5, 20,
    );
    print pack("C*", @values);
    

    although this works, too:

    print chr for @values;
    

    as does

    print map { chr } @values;
    

    Although for those who like everything all rigorous and careful and all, this might be more what you would see:

    #!/usr/bin/env perl
    
    use strict;
    use warnings qw[ FATAL all ];
    use autodie;
    
    binmode(STDOUT);
    
    my @octet_list = (
        0,  0,  0,  0,  2,
        4,  6,  8,  0,  1,
        3,  0,  5, 20,
    );
    
    my $binary = pack("C*", @octet_list);
    print STDOUT $binary;
    
    close(STDOUT); 
    

    TMTOWTDI

    Perl supports more than one way to do things so that you can pick the one that you’re most comfortable with. If this were something I planned to check in as school or work project, I would certainly select the longer, more careful versions — or at least put a comment in the shell script if I were using the one-liners.

    You can find documentation for Perl on your own system. Just type

    % man perl
    % man perlrun
    % man perlvar
    % man perlfunc
    

    etc at your shell prompt. If you want pretty-ish versions on the web instead, get the manpages for perl, perlrun, perlvar, and perlfunc from http://perldoc.perl.org.

    0 讨论(0)
  • 2020-12-13 19:29

    What about grep -a? Not sure how it works on truly binary files but it works well on text files that the OS thinks is binary.

    0 讨论(0)
  • 2020-12-13 19:34

    One way to solve your immediate problem using only grep is to create a file containing a single null byte. After that, grep -abo -f null_byte_file target_file will produce the following output.

    0:
    1:
    2:
    3:
    8:
    11:
    

    That is of course each byte offset as requested by "-b" followed by a null byte as requested by "-o"

    I'd be the first to advocate perl, but in this case there's no need to bring in the extended family.

    0 讨论(0)
  • 2020-12-13 19:38

    The bbe program is a sed-like editor for binary files. See documentation.

    Example with bbe:

    bbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin
    
    11:x00 x00 xcc x00 x00 x00 xcd x00 x00 x00 xce
    

    Explanation

    -b search pattern between //. each 2 byte begin with \x (hexa notation).
       -b works like this /pattern/:length (in byte) after matched pattern
    -s similar to 'grep -o' suppress unmatched output 
    -e similar to 'sed -e' give commands
    -e 'F d' display offsets before each result here: '11:'
    -e 'p h' print results in hexadecimal notation
    -e 'A \n' append end-of-line to each result
    

    You can also pipe it to sed to have a cleaner output:

    bbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin | sed -e 's/x//g'
    
    11:00 00 cc 00 00 00 cd 00 00 00 ce
    

    Your solution with Perl from your EDIT3 give me an 'Out of memory' error with large files.

    The same problem goes with bgrep.

    The only downside to bbe is that I don't know how to print context that precedes a matched pattern.

    0 讨论(0)
提交回复
热议问题