I have two zip files, both of them open well with Windows Explorer and 7-zip.
However when i open them with Python\'s zipfile module [ zipfile.ZipFile(\"filex.zip\"
Show the full traceback that you got from Python -- this may give a hint as to what the specific problem is. Unanswered: What software produced the bad file, and on what platform?
Update: Traceback indicates having problem detecting the "End of Central Directory" record in the file -- see function _EndRecData starting at line 128 of C:\Python25\Lib\zipfile.py
Suggestions:
(1) Trace through the above function
(2) Try it on the latest Python
(3) Answer the question above.
(4) Read this and anything else found by google("BadZipfile: File is not a zip file") that appears to be relevant
astronautlevel's solution works for most cases, but the compressed data and CRCs in the Zip can also contain the same 4 bytes. You should do an rfind (not find), seek to pos+20 and then add write \x00\x00 to the end of the file (tell zip applications that the length of the 'comments' section is 0 bytes long).
# HACK: See http://bugs.python.org/issue10694
# The zip file generated is correct, but because of extra data after the 'central directory' section,
# Some version of python (and some zip applications) can't read the file. By removing the extra data,
# we ensure that all applications can read the zip without issue.
# The ZIP format: http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT
# Finding the end of the central directory:
# http://stackoverflow.com/questions/8593904/how-to-find-the-position-of-central-directory-in-a-zip-file
# http://stackoverflow.com/questions/20276105/why-cant-python-execute-a-zip-archive-passed-via-stdin
# This second link is only losely related, but echos the first, "processing a ZIP archive often requires backwards seeking"
content = zipFileContainer.read()
pos = content.rfind('\x50\x4b\x05\x06') # reverse find: this string of bytes is the end of the zip's central directory.
if pos>0:
zipFileContainer.seek(pos+20) # +20: see secion V.I in 'ZIP format' link above.
zipFileContainer.truncate()
zipFileContainer.write('\x00\x00') # Zip file comment length: 0 byte length; tell zip applications to stop reading.
zipFileContainer.seek(0)
return zipFileContainer
Have you tried a newer python, or if that is too much trouble, simply a newer zipfile.py? I have successfully used a copy of zipfile.py from Python 2.6.2 (latest at the time) with Python 2.5 in order to open some zip files that weren't supported by Py2.5s zipfile module.
files named file can confuse python - try naming it something else. if it STILL wont work, try this code:
def fixBadZipfile(zipFile):
f = open(zipFile, 'r+b')
data = f.read()
pos = data.find('\x50\x4b\x05\x06') # End of central directory signature
if (pos > 0):
self._log("Trancating file at location " + str(pos + 22)+ ".")
f.seek(pos + 22) # size of 'ZIP end of central directory record'
f.truncate()
f.close()
else:
# raise error, file is truncated
I run into the same issue. My problem was that it was a gzip instead of a zip file. I switched to the class gzip.GzipFile and it worked like a charm.
I had the same problem and was able to solve this issue for my files, see my answer at zipfile cant handle some type of zip data?