问题
I want to extract all files with the pattern *_sl_H*
from many tar.gz files, without extracting all files from the archives.
I found these lines, but it is not possible to work with wildcards (https://pymotw.com/2/tarfile/):
import tarfile
import os
os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
t.extractall('outdir', members=[t.getmember('README.txt')])
print os.listdir('outdir')
Does someone have an idea? Many thanks in advance.
回答1:
First you can use glob
to get you a list of the *.tar
files in a given folder. Then on each tar file get a list of the members and filter them using a regular expression. Then pass this list to the members
parameter as follows:
import tarfile
import glob
import re
reT = re.compile(r'.*?_sl_H.*?')
for tar_filename in glob.glob(r'\my_source_folder\*.tar'):
try:
t = tarfile.open(tar_filename, 'r')
except IOError as e:
print e
else:
t.extractall('outdir', members=[m for m in t.getmembers() if reT.search(m.name)])
回答2:
Take a look at TarFile.getmembers() method which returns the members of the archive as a list. After you have this list, you can decide with a condition which file is going to be extracted.
import tarfile
import os
os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
for member in t.getmembers():
if "_sl_H" in member.name:
t.extract(member, "outdir")
print os.listdir('outdir')
来源:https://stackoverflow.com/questions/35865099/python-extracting-specific-files-with-pattern-from-tar-gz-without-extracting-th