How to compress a large file in Python?

白昼怎懂夜的黑 提交于 2021-02-06 12:51:21

问题


The problem I'm experiencing is the name of the stored file. The stored file isn't named with the original/uncompressed file name. Instead the stored file is named with the archive name (with the appended ".gz" extension).

Expected Result:
file.txt.gz {archive name}
....file.txt {stored file name}

Actual Result:
file.txt.gz {archive name}
....file.txt.gz {stored file name}

Reading through the gzip documentation (https://docs.python.org/2.7/library/gzip.html) example code:

import gzip
import shutil
with open('file.txt', 'rb') as f_in, gzip.open('file.txt.gz', 'wb') as f_out:
    shutil.copyfileobj(f_in, f_out)

How do I get the archive to store the file with the name "file.txt" instead of "file.txt.gz"?


回答1:


You have to use gzip.GzipFile(); the shorthand gzip.open() won't do what you want.

Quoth the doc:

When fileobj is not None, the filename argument is only used to be included in the gzip file header, which may include the original filename of the uncompressed file. It defaults to the filename of fileobj, if discernible; otherwise, it defaults to the empty string, and in this case the original filename is not included in the header.

Try this:

import gzip
import shutil
with open('file.txt', 'rb') as f_in:
    with open('file.txt.gz', 'wb') as f_out:
        with gzip.GzipFile('file.txt', 'wb', fileobj=f_out) as f_out:
            shutil.copyfileobj(f_in, f_out)



回答2:


You are making a distinction between 'stored file name' and 'archive name', but for gzip compression, this is the wrong way to think, because gzip is not an archive format, but just a compression protocol.

When you store a 'gzip' file, it does not (necessarily) remember the original filename. There is only the compressed contents of the original file, which you can give any name you want. There is a convention to give it the same name as the original file but with ".gz" appended. The "gzip" and "gunzip" utilities on Unix systems will assume this if you only supply a filename:

gzip foo.txt
# now foo.txt has been deleted, and foo.txt.gz exists
gunzip foo.txt.gz
# now you have foo.txt back, and foo.txt.gz has been deleted.

If you rename foo.txt.gz to bar.txt.gz, and then gunzip, you will get 'goo.txt' if you use Unix gunzip (but other utilities might do something different).

However, you can use gzip and gunzip in stream mode, in which case they know nothing about filenames - gzip is really about compression, and doesn't care about filenames.

(Edit: gzip can store a filename, but in some cases it can't (if there is no original "file", only data), and whether this is used or not when decompressing is entirely optional).



来源:https://stackoverflow.com/questions/38018071/how-to-compress-a-large-file-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!