Python: Extracting specific files with pattern from tar.gz without extracting the complete file

元气小坏坏 提交于 2019-12-11 10:37:21

问题


I want to extract all files with the pattern *_sl_H* from many tar.gz files, without extracting all files from the archives.

I found these lines, but it is not possible to work with wildcards (https://pymotw.com/2/tarfile/):

import tarfile
import os

os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
t.extractall('outdir', members=[t.getmember('README.txt')])
print os.listdir('outdir')

Does someone have an idea? Many thanks in advance.


回答1:


First you can use glob to get you a list of the *.tar files in a given folder. Then on each tar file get a list of the members and filter them using a regular expression. Then pass this list to the members parameter as follows:

import tarfile
import glob
import re

reT = re.compile(r'.*?_sl_H.*?')

for tar_filename in glob.glob(r'\my_source_folder\*.tar'):
    try:
        t = tarfile.open(tar_filename, 'r')
    except IOError as e:
        print e
    else:
        t.extractall('outdir', members=[m for m in t.getmembers() if reT.search(m.name)])



回答2:


Take a look at TarFile.getmembers() method which returns the members of the archive as a list. After you have this list, you can decide with a condition which file is going to be extracted.

import tarfile
import os

os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
for member in t.getmembers():
    if "_sl_H" in member.name:
        t.extract(member, "outdir")

print os.listdir('outdir')


来源:https://stackoverflow.com/questions/35865099/python-extracting-specific-files-with-pattern-from-tar-gz-without-extracting-th

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!