How to grep for a pattern in the files in tar archive without filling up disk space

别说谁变了你拦得住时间么 提交于 2019-12-18 04:55:17

问题


I have a tar archive which is very big ~ 5GB.

I want to grep for a pattern on all files (and also print the name of the file that has the pattern ) in the archive but do not want to fill up my disk space by extracting the archive.

Anyway I can do that?

I tried these, but this does not give me the file names that contain the pattern, just the matching lines:

tar -O -xf test.tar.gz | grep 'this'
tar -xf test.tar.gz --to-command='grep awesome'

Also where is this feature of tar documented? tar xf test.tar $FILE


回答1:


Here's my take on this:

while read filename; do tar -xOf file.tar "$filename" | grep 'pattern' | sed "s|^|$filename:|"; done < <(tar -tf file.tar | grep -v '/$')

Broken out for explanation:

  • while read filename; do -- it's a loop...
  • tar -xOf file.tar "$filename" -- this extracts each file...
  • | grep 'pattern' -- here's where you put your pattern...
  • | sed "s|^|$filename:|"; - prepend the filename, so this looks like grep. Salt to taste.
  • done < <(tar -tf file.tar | grep -v '/$') -- end the loop, get the list of files as to fead to your while read.

One proviso: this breaks if you have OR bars (|) in your filenames.

Hmm. In fact, this makes a nice little bash function, which you can append to your .bashrc file:

targrep() {

  local taropt=""

  if [[ ! -f "$2" ]]; then
    echo "Usage: targrep pattern file ..."
  fi

  while [[ -n "$2" ]]; do    

    if [[ ! -f "$2" ]]; then
      echo "targrep: $2: No such file" >&2
    fi

    case "$2" in
      *.tar.gz) taropt="-z" ;;
      *) taropt="" ;;
    esac

    while read filename; do
      tar $taropt -xOf "$2" \
       | grep "$1" \
       | sed "s|^|$filename:|";
    done < <(tar $taropt -tf $2 | grep -v '/$')

  shift

  done
}



回答2:


Seems like nobody posted this simple solution that processes the archive only once:

tar xzf archive.tgz --to-command \
    'grep --label="$TAR_FILENAME" -H PATTERN ; true'

Here tar passes the name of each file in a variable (see the docs) and it is used by grep to print it with each match. Also true is added so that tar doesn't complain about failing to extract files that don't match.




回答3:


Here's a bash function that may work for you. Add the following to your ~/.bashrc

targrep () {
    for i in $(tar -tzf "$1"); do
        results=$(tar -Oxzf "$1" "$i" | grep --label="$i" -H "$2")
        echo "$results"
    done
}

Usage:

targrep archive.tar.gz "pattern"



回答4:


It's incredibly hacky, but you could abuse tar's -v option to process and delete each file as it is extracted.

grep_and_delete() {
  if [ -n "$1" -a -f "$1" ]; then
    grep -H 'this' -- "$1" </dev/null
    rm -f -- "$1" </dev/null
  fi
}
mkdir tmp; cd tmp
tar -xvzf test.tar.gz | (
  prev=''
  while read pathname; do
    grep_and_delete "$prev"
    prev="$pathname"
  done
  grep_and_delete "$prev"
)



回答5:


tar -tf test.tar.gz | grep -v '/$'| \
xargs -n 1 -I _ \
sh -c 'tar -xOf test.tar.gz _|grep -q <YOUR SEARCH PATTERN>  && echo _'



回答6:


Try:

    tar tvf name_of_file |grep --regex="pattern"

The t option will test the tar file without extracting the files. The v is verbose and the f prints he filenames. This should save you considerable hard disk space.




回答7:


may help

zcat log.tar.gz | grep -a -i "string"

zgrep -i "string" log.tar.gz

http://www.commandlinefu.com/commands/view/9261/grep-compressed-log-files-without-extracting



来源:https://stackoverflow.com/questions/13041068/how-to-grep-for-a-pattern-in-the-files-in-tar-archive-without-filling-up-disk-sp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!