grep from tar.gz without extracting [faster one]

前端 未结 8 691
Happy的楠姐
Happy的楠姐 2020-12-13 04:02

Am trying to grep pattern from dozen files .tar.gz but its very slow

am using

tar -ztf file.tar.gz | while read FILENAME
do
        if tar -zxf file         


        
8条回答
  •  一向
    一向 (楼主)
    2020-12-13 04:25

    Am trying to grep pattern from dozen files .tar.gz but its very slow

    tar -ztf file.tar.gz | while read FILENAME
    do
            if tar -zxf file.tar.gz "$FILENAME" -O | grep "string" > /dev/null
            then
                    echo "$FILENAME contains string"
            fi
    done
    

    That's actually very easy with ugrep option -z:

    -z, --decompress
            Decompress files to search, when compressed.  Archives (.cpio,
            .pax, .tar, and .zip) and compressed archives (e.g. .taz, .tgz,
            .tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, and .txz) are searched and
            matching pathnames of files in archives are output in braces.  If
            -g, -O, -M, or -t is specified, searches files within archives
            whose name matches globs, matches file name extensions, matches
            file signature magic bytes, or matches file types, respectively.
            Supported compression formats: gzip (.gz), compress (.Z), zip,
            bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2),
            lzma and xz (requires suffix .lzma, .tlz, .xz, .txz).
    

    Which requires just one command to search file.tar.gz as follows:

    ugrep -z "string" file.tar.gz
    

    This greps each of the archived files to display matches. Archived filenames are shown in braces to distinguish them from ordinary filenames. For example:

    $ ugrep -z "Hello" archive.tgz
    {Hello.bat}:echo "Hello World!"
    Binary file archive.tgz{Hello.class} matches
    {Hello.java}:public class Hello // prints a Hello World! greeting
    {Hello.java}:  { System.out.println("Hello World!");
    {Hello.pdf}:(Hello)
    {Hello.sh}:echo "Hello World!"
    {Hello.txt}:Hello
    

    If you just want the file names, use option -l (--files-with-matches) and customize the filename output with option --format="%z%~" to get rid of the braces:

    $ ugrep -z Hello -l --format="%z%~" archive.tgz
    Hello.bat
    Hello.class
    Hello.java
    Hello.pdf
    Hello.sh
    Hello.txt
    

提交回复
热议问题