How to read from a zip file within zip file in Python?

后端 未结 3 748
青春惊慌失措
青春惊慌失措 2020-12-15 03:08

I have a file that I want to read that is itself zipped within a zip archive. For example, parent.zip contains child.zip, which contains child.txt. I am having trouble readi

3条回答
  •  [愿得一人]
    2020-12-15 03:51

    When you use the .open() call on a ZipFile instance you indeed get an open file handle. However, to read a zip file, the ZipFile class needs a little more. It needs to be able to seek on that file, and the object returned by .open() is not seekable in your case. Only Python 3 (3.2 and up) produces a ZipExFile object that supports seeking (provided the underlying file handle for the outer zip file is seekable, and nothing is trying to write to the ZipFile object).

    The workaround is to read the whole zip entry into memory using .read(), store it in a BytesIO object (an in-memory file that is seekable) and feed that to ZipFile:

    from io import BytesIO
    
    # ...
            zfiledata = BytesIO(zfile.read(name))
            with zipfile.ZipFile(zfiledata) as zfile2:
    

    or, in the context of your example:

    import zipfile
    from io import BytesIO
    
    with zipfile.ZipFile("parent.zip", "r") as zfile:
        for name in zfile.namelist():
            if re.search(r'\.zip$', name) is not None:
                # We have a zip within a zip
                zfiledata = BytesIO(zfile.read(name))
                with zipfile.ZipFile(zfiledata) as zfile2:
                    for name2 in zfile2.namelist():
                        # Now we can extract
                        logging.info( "Found internal internal file: " + name2)
                        print "Processing code goes here"
    

提交回复
热议问题