unzipping file results in “BadZipFile: File is not a zip file”

前端 未结 7 489
甜味超标
甜味超标 2020-12-16 09:13

I have two zip files, both of them open well with Windows Explorer and 7-zip.

However when i open them with Python\'s zipfile module [ zipfile.ZipFile(\"filex.zip\"

相关标签:
7条回答
  • 2020-12-16 09:29

    Show the full traceback that you got from Python -- this may give a hint as to what the specific problem is. Unanswered: What software produced the bad file, and on what platform?

    Update: Traceback indicates having problem detecting the "End of Central Directory" record in the file -- see function _EndRecData starting at line 128 of C:\Python25\Lib\zipfile.py

    Suggestions:
    (1) Trace through the above function
    (2) Try it on the latest Python
    (3) Answer the question above.
    (4) Read this and anything else found by google("BadZipfile: File is not a zip file") that appears to be relevant

    0 讨论(0)
  • 2020-12-16 09:40

    astronautlevel's solution works for most cases, but the compressed data and CRCs in the Zip can also contain the same 4 bytes. You should do an rfind (not find), seek to pos+20 and then add write \x00\x00 to the end of the file (tell zip applications that the length of the 'comments' section is 0 bytes long).

    
        # HACK: See http://bugs.python.org/issue10694
        # The zip file generated is correct, but because of extra data after the 'central directory' section,
        # Some version of python (and some zip applications) can't read the file. By removing the extra data,
        # we ensure that all applications can read the zip without issue.
        # The ZIP format: http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT
        # Finding the end of the central directory:
        #   http://stackoverflow.com/questions/8593904/how-to-find-the-position-of-central-directory-in-a-zip-file
        #   http://stackoverflow.com/questions/20276105/why-cant-python-execute-a-zip-archive-passed-via-stdin
        #       This second link is only losely related, but echos the first, "processing a ZIP archive often requires backwards seeking"
        content = zipFileContainer.read()
        pos = content.rfind('\x50\x4b\x05\x06') # reverse find: this string of bytes is the end of the zip's central directory.
        if pos>0:
            zipFileContainer.seek(pos+20) # +20: see secion V.I in 'ZIP format' link above.
            zipFileContainer.truncate()
            zipFileContainer.write('\x00\x00') # Zip file comment length: 0 byte length; tell zip applications to stop reading.
            zipFileContainer.seek(0)
    
        return zipFileContainer
    0 讨论(0)
  • 2020-12-16 09:43

    Have you tried a newer python, or if that is too much trouble, simply a newer zipfile.py? I have successfully used a copy of zipfile.py from Python 2.6.2 (latest at the time) with Python 2.5 in order to open some zip files that weren't supported by Py2.5s zipfile module.

    0 讨论(0)
  • 2020-12-16 09:44

    files named file can confuse python - try naming it something else. if it STILL wont work, try this code:

    def fixBadZipfile(zipFile):  
     f = open(zipFile, 'r+b')  
     data = f.read()  
     pos = data.find('\x50\x4b\x05\x06') # End of central directory signature  
     if (pos > 0):  
         self._log("Trancating file at location " + str(pos + 22)+ ".")  
         f.seek(pos + 22)   # size of 'ZIP end of central directory record' 
         f.truncate()  
         f.close()  
     else:  
         # raise error, file is truncated  
    
    0 讨论(0)
  • 2020-12-16 09:47

    I run into the same issue. My problem was that it was a gzip instead of a zip file. I switched to the class gzip.GzipFile and it worked like a charm.

    0 讨论(0)
  • 2020-12-16 09:52

    I had the same problem and was able to solve this issue for my files, see my answer at zipfile cant handle some type of zip data?

    0 讨论(0)
提交回复
热议问题