How do I extract only the file of a .tar.gz member?

落花浮王杯 提交于 2019-12-09 20:19:54

问题


My goal is to unpack a .tar.gz file and not its sub-directories leading up to the file.

My code is based off this question except instead of unpacking a .zip I am unpacking a .tar.gz file.

I am asking this question because the error I'm getting is very vague and doesn't identify the problem in my code:

import os
import shutil
import tarfile

with tarfile.open('RTLog_20150425T152948.gz', 'r:gz') as tar:
    for member in tar.getmembers():
        filename = os.path.basename(member.name)
        if not filename:
            continue

        # copy file (taken from zipfile's extract)
        source = member
        target = open(os.path.join(os.getcwd(), filename), "wb")
        with source, target:
            shutil.copyfileobj(source, target)

As you can see I copied the code from the linked question and tried to change it to deal with .tar.gz members instead of .zip members. Upon running the code I get the following error:

Traceback (most recent call last):
  File "C:\Users\dzhao\Desktop\123456\444444\blah.py", line 27, in <module>
    with source, target:
AttributeError: __exit__

From the reading I've done, shutil.copyfileobj takes as input two "file-like" objects. member is a TarInfo object. I'm not sure if a TarInfo object is a file-like object so I tried changing this line from:

source = member #to
source = open(os.path.join(os.getcwd(), member.name), 'rb')

But this understandably raised an error where the file wasn't found.

What am I not understanding?


回答1:


This code has worked for me:

import os
import shutil
import tarfile

with tarfile.open(fname, "r|*") as tar:
    counter = 0

    for member in tar:
        if member.isfile():
            filename = os.path.basename(member.name)
            if filename != "myfile": # do your check
                continue

            with open("output.file", "wb") as output: 
                shutil.copyfileobj(tar.fileobj, output, member.size)

            break # got our file

        counter += 1
        if counter % 1000 == 0:
            tar.members = [] # free ram... yes we have to do this manually

But your problem might not be the extraction, but rather that your file is indeed no .tar.gz but just a .gz file.

Edit: Also your getting the error on the with line because python is trying to call the __enter__ function of the member object (wich does not exist).



来源:https://stackoverflow.com/questions/37752400/how-do-i-extract-only-the-file-of-a-tar-gz-member

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!