I have list of files which contain particular patterns, but those files have been tarred. Now I want to search for the pattern in the tar file, and to know which files contain the pattern without extracting the files.
Any idea...?
I have list of files which contain particular patterns, but those files have been tarred. Now I want to search for the pattern in the tar file, and to know which files contain the pattern without extracting the files.
Any idea...?
the tar
command has a -O
switch to extract your files to standard output. So you can pipe those output to grep/awk
tar xvf test.tar -O | awk '/pattern/{print}' tar xvf test.tar -O | grep "pattern"
eg to return file name one pattern found
tar tf myarchive.tar | while read -r FILE do if tar xf test.tar $FILE -O | grep "pattern" ;then echo "found pattern in : $FILE" fi done
The command zgrep
should do exactly what you want, directly.
for example
zgrep "mypattern" *.gz
GNU tar
has --to-command
. With it you can have tar
pipe each file from the archive into the given command. For the case where you just want the lines that match, that command can be a simple grep
. To know the filenames you need to take advantage of tar setting certain variables in the command's environment; for example,
tar xaf thing.tar.xz --to-command="awk -e '/thing.to.match/ {print ENVIRON[\"TAR_FILENAME\"] \":\", \$0}'"
Because I find myself using this often, I have this:
#!/bin/sh set -eu if [ $# -lt 2 ]; then echo "Usage: $(basename "$0") " exit 1 fi if [ -t 1 ]; then h="$(tput setf 4)" m="$(tput setf 5)" f="$(tput sgr0)" else h="" m="" f="" fi tar xaf "$2" --to-command="awk -e '/$1/{gsub(\"$1\", \"$m&$f\"); print \"$h\" ENVIRON[\"TAR_FILENAME\"] \"$f:\", \$0}'"
Python's tarfile
module along with Tarfile.extractfile()
will allow you to inspect the tarball's contents without extracting it to disk.
The easiest way is probably to use avfs. I've used this before for such tasks.
Basically, the syntax is:
avfsd ~/.avfs # Sets up a avfs virtual filesystem rgrep pattern ~/.avfs/path/to/file.tar#/
/path/to/file.tar
is the path to the actual tar file.
Pre-pending ~/.avfs/
(the mount point) and appending # lets avfs expose the tar file as a directory.