Performing grep operation in tar files without extracting

后端 未结 6 1365
梦谈多话
梦谈多话 2020-12-05 00:28

I have list of files which contain particular patterns, but those files have been tarred. Now I want to search for the pattern in the tar file, and to know which files conta

相关标签:
6条回答
  • 2020-12-05 00:50

    The command zgrep should do exactly what you want, directly.

    for example

    zgrep "mypattern" *.gz
    

    http://linux.about.com/library/cmd/blcmdl1_zgrep.htm

    0 讨论(0)
  • 2020-12-05 00:50

    GNU tar has --to-command. With it you can have tar pipe each file from the archive into the given command. For the case where you just want the lines that match, that command can be a simple grep. To know the filenames you need to take advantage of tar setting certain variables in the command's environment; for example,

    tar xaf thing.tar.xz --to-command="awk -e '/thing.to.match/ {print ENVIRON[\"TAR_FILENAME\"] \":\", \$0}'"
    

    Because I find myself using this often, I have this:

    #!/bin/sh
    set -eu
    
    if [ $# -lt 2 ]; then
        echo "Usage: $(basename "$0") <pattern> <tarfile>"
        exit 1
    fi
    
    if [ -t 1 ]; then
        h="$(tput setf 4)"
        m="$(tput setf 5)"
        f="$(tput sgr0)"
    else
        h=""
        m=""
        f=""
    fi
    
    tar xaf "$2" --to-command="awk -e '/$1/{gsub(\"$1\", \"$m&$f\"); print \"$h\" ENVIRON[\"TAR_FILENAME\"] \"$f:\", \$0}'"
    
    0 讨论(0)
  • 2020-12-05 00:51

    The easiest way is probably to use avfs. I've used this before for such tasks.

    Basically, the syntax is:

    avfsd ~/.avfs # Sets up a avfs virtual filesystem
    rgrep pattern ~/.avfs/path/to/file.tar#/
    

    /path/to/file.tar is the path to the actual tar file.

    Pre-pending ~/.avfs/ (the mount point) and appending # lets avfs expose the tar file as a directory.

    0 讨论(0)
  • 2020-12-05 00:59

    Python's tarfile module along with Tarfile.extractfile() will allow you to inspect the tarball's contents without extracting it to disk.

    0 讨论(0)
  • 2020-12-05 01:01

    the tar command has a -O switch to extract your files to standard output. So you can pipe those output to grep/awk

    tar xvf  test.tar -O | awk '/pattern/{print}'
    
    tar xvf  test.tar -O | grep "pattern"
    

    eg to return file name one pattern found

    tar tf myarchive.tar | while read -r FILE
    do
        if tar xf test.tar $FILE  -O | grep "pattern" ;then
            echo "found pattern in : $FILE"
        fi
    done
    
    0 讨论(0)
  • 2020-12-05 01:02

    That's actually very easy with ugrep option -z:

    -z, --decompress
            Decompress files to search, when compressed.  Archives (.cpio,
            .pax, .tar, and .zip) and compressed archives (e.g. .taz, .tgz,
            .tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, and .txz) are searched and
            matching pathnames of files in archives are output in braces.  If
            -g, -O, -M, or -t is specified, searches files within archives
            whose name matches globs, matches file name extensions, matches
            file signature magic bytes, or matches file types, respectively.
            Supported compression formats: gzip (.gz), compress (.Z), zip,
            bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2),
            lzma and xz (requires suffix .lzma, .tlz, .xz, .txz).
    

    For example:

    ugrep -z PATTERN archive.tgz
    

    This greps each of the archived files to display PATTERN matches with the archived filenames. Archived filenames are shown in braces to distinguish them from ordinary filenames. Everything else is the same as grep (ugrep has the same options and produces the same output). For example:

    $ ugrep -z "Hello" archive.tgz
    {Hello.bat}:echo "Hello World!"
    Binary file archive.tgz{Hello.class} matches
    {Hello.java}:public class Hello // prints a Hello World! greeting
    {Hello.java}:  { System.out.println("Hello World!");
    {Hello.pdf}:(Hello)
    {Hello.sh}:echo "Hello World!"
    {Hello.txt}:Hello
    

    If you just want the file names, use option -l (--files-with-matches) and customize the filename output with option --format="%z%~" to get rid of the braces:

    $ ugrep -z Hello -l --format="%z%~" archive.tgz
    Hello.bat
    Hello.class
    Hello.java
    Hello.pdf
    Hello.sh
    Hello.txt
    

    Tarballs (.tar.gz/.tgz, .tar.bz2/.tbz, .tar.xz/.txz, .tar.lzma/.tlz) are searched as well as .zip archives.

    0 讨论(0)
提交回复
热议问题