Python: Extracting specific files with pattern from tar.gz without extracting the complete file

后端 未结 2 1540
梦如初夏
梦如初夏 2021-01-20 11:47

I want to extract all files with the pattern *_sl_H* from many tar.gz files, without extracting all files from the archives.

I found these lines, but it

2条回答
  •  自闭症患者
    2021-01-20 12:29

    You can extract all files matching your pattern from many tar as follows:

    1. Use glob to get you a list of all of the *.tar or *.gz files in a given folder.

    2. For each tar file, get a list of the files in each tar file using the getmembers() function.

    3. Use a regular expression (or a simple if "xxx" in test) to filter the required files.

    4. Pass this list of matching files to the members parameter in the extractall() function.

    5. Exception handling is added to catch badly encoded tar files.

    For example:

    import tarfile
    import glob
    import re
    
    reT = re.compile(r'.*?_sl_H.*?')
    
    for tar_filename in glob.glob(r'\my_source_folder\*.tar'):
        try:
            t = tarfile.open(tar_filename, 'r')
        except IOError as e:
            print(e)
        else:
            t.extractall('outdir', members=[m for m in t.getmembers() if reT.search(m.name)])
    

提交回复
热议问题