grep from tar.gz without extracting [faster one]

前端 未结 8 685
Happy的楠姐
Happy的楠姐 2020-12-13 04:02

Am trying to grep pattern from dozen files .tar.gz but its very slow

am using

tar -ztf file.tar.gz | while read FILENAME
do
        if tar -zxf file         


        
8条回答
  •  北海茫月
    2020-12-13 04:26

    For starters, you could start more than one process:

    tar -ztf file.tar.gz | while read FILENAME
    do
            (if tar -zxf file.tar.gz "$FILENAME" -O | grep -l "string"
            then
                    echo "$FILENAME contains string"
            fi) &
    done
    

    The ( ... ) & creates a new detached (read: the parent shell does not wait for the child) process.

    After that, you should optimize the extracting of your archive. The read is no problem, as the OS should have cached the file access already. However, tar needs to unpack the archive every time the loop runs, which can be slow. Unpacking the archive once and iterating over the result may help here:

    local tempPath=`tempfile`
    mkdir $tempPath && tar -zxf file.tar.gz -C $tempPath &&
    find $tempPath -type f | while read FILENAME
    do
            (if grep -l "string" "$FILENAME"
            then
                    echo "$FILENAME contains string"
            fi) &
    done && rm -r $tempPath
    

    find is used here, to get a list of files in the target directory of tar, which we're iterating over, for each file searching for a string.

    Edit: Use grep -l to speed up things, as Jim pointed out. From man grep:

       -l, --files-with-matches
              Suppress normal output; instead print the name of each input file from which output would
              normally have been printed.  The scanning will stop on the first match.  (-l is specified
              by POSIX.)
    

提交回复
热议问题