How to read from a zip file within zip file in Python?

僤鯓⒐⒋嵵緔 提交于 2019-11-27 14:22:28

问题


I have a file that I want to read that is itself zipped within a zip archive. For example, parent.zip contains child.zip, which contains child.txt. I am having trouble reading child.zip. Can anyone correct my code?

I assume that I need to create child.zip as a file-like object and then open it with a second instance of zipfile, but being new to python my zipfile.ZipFile(zfile.open(name)) is silly. It raises a zipfile.BadZipfile: "File is not a zip file" on (independently validated) child.zip

import zipfile
with zipfile.ZipFile("parent.zip", "r") as zfile:
    for name in zfile.namelist():
        if re.search(r'\.zip$', name) is not None:
            # We have a zip within a zip
            with **zipfile.ZipFile(zfile.open(name))** as zfile2:
                    for name2 in zfile2.namelist():
                        # Now we can extract
                        logging.info( "Found internal internal file: " + name2)
                        print "Processing code goes here"

回答1:


When you use the .open() call on a ZipFile instance you indeed get an open file handle. However, to read a zip file, the ZipFile class needs a little more. It needs to be able to seek on that file, and the object returned by .open() is not seekable in your case. Only Python 3 (3.2 and up) produces a ZipExFile object that supports seeking (provided the underlying file handle for the outer zip file is seekable, and nothing is trying to write to the ZipFile object).

The workaround is to read the whole zip entry into memory using .read(), store it in a BytesIO object (an in-memory file that is seekable) and feed that to ZipFile:

from io import BytesIO

# ...
        zfiledata = BytesIO(zfile.read(name))
        with zipfile.ZipFile(zfiledata) as zfile2:

or, in the context of your example:

import zipfile
from io import BytesIO

with zipfile.ZipFile("parent.zip", "r") as zfile:
    for name in zfile.namelist():
        if re.search(r'\.zip$', name) is not None:
            # We have a zip within a zip
            zfiledata = BytesIO(zfile.read(name))
            with zipfile.ZipFile(zfiledata) as zfile2:
                for name2 in zfile2.namelist():
                    # Now we can extract
                    logging.info( "Found internal internal file: " + name2)
                    print "Processing code goes here"



回答2:


To get this to work with python33 (under windows but that might be unrelevant) i had to do :

 import zipfile, re, io
    with zipfile.ZipFile(file, 'r') as zfile:
        for name in zfile.namelist():
            if re.search(r'\.zip$', name) != None:
                zfiledata = io.BytesIO(zfile.read(name))
                with zipfile.ZipFile(zfiledata) as zfile2:
                    for name2 in zfile2.namelist():
                        print(name2)

cStringIO does not exist so i used io.BytesIO




回答3:


Here's a function I came up with. (Copied from here.)

def extract_nested_zipfile(path, parent_zip=None):
    """Returns a ZipFile specified by path, even if the path contains
    intermediary ZipFiles.  For example, /root/gparent.zip/parent.zip/child.zip
    will return a ZipFile that represents child.zip
    """

    def extract_inner_zipfile(parent_zip, child_zip_path):
        """Returns a ZipFile specified by child_zip_path that exists inside
        parent_zip.
        """
        memory_zip = StringIO()
        memory_zip.write(parent_zip.open(child_zip_path).read())
        return zipfile.ZipFile(memory_zip)

    if ('.zip' + os.sep) in path:
        (parent_zip_path, child_zip_path) = os.path.relpath(path).split(
            '.zip' + os.sep, 1)
        parent_zip_path += '.zip'

        if not parent_zip:
            # This is the top-level, so read from disk
            parent_zip = zipfile.ZipFile(parent_zip_path)
        else:
            # We're already in a zip, so pull it out and recurse
            parent_zip = extract_inner_zipfile(parent_zip, parent_zip_path)

        return extract_nested_zipfile(child_zip_path, parent_zip)
    else:
        if parent_zip:
            return extract_inner_zipfile(parent_zip, path)
        else:
            # If there is no nesting, it's easy!
            return zipfile.ZipFile(path)

Here's how I tested it:

echo hello world > hi.txt
zip wrap1.zip hi.txt
zip wrap2.zip wrap1.zip
zip wrap3.zip wrap2.zip

print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap1.zip').open('hi.txt').read()
print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap2.zip/wrap1.zip').open('hi.txt').read()
print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap3.zip/wrap2.zip/wrap1.zip').open('hi.txt').read()


来源:https://stackoverflow.com/questions/12025469/how-to-read-from-a-zip-file-within-zip-file-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!