Python Gzip - Appending to file on the fly

有些话、适合烂在心里 提交于 2019-12-19 05:08:59

问题


Is it possible to append to a gzipped text file on the fly using Python ?

Basically I am doing this:-

import gzip
content = "Lots of content here"
f = gzip.open('file.txt.gz', 'a', 9)
f.write(content)
f.close()

A line is appended (note "appended") to the file every 6 seconds or so, but the resulting file is just as big as a standard uncompressed file (roughly 1MB when done).

Explicitly specifying the compression level does not seem to make a difference either.

If I gzip an existing uncompressed file afterwards, it's size comes down to roughly 80kb.

Im guessing its not possible to "append" to a gzip file on the fly and have it compress ?

Is this a case of writing to a String.IO buffer and then flushing to a gzip file when done ?


回答1:


That works in the sense of creating and maintaining a valid gzip file, since the gzip format permits concatenated gzip streams.

However it doesn't work in the sense that you get lousy compression, since you are giving each instance of gzip compression so little data to work with. Compression depends on taking advantage the history of previous data, but here gzip has been given essentially none.

You could either a) accumulate at least a few K of data, many of your lines, before invoking gzip to add another gzip stream to the file, or b) do something much more sophisticated that appends to a single gzip stream, leaving a valid gzip stream each time and permitting efficient compression of the data.

You find an example of b) in C, in gzlog.h and gzlog.c. I do not believe that Python has all of the interfaces to zlib needed to implement gzlog directly in Python, but you could interface to the C code from Python.



来源:https://stackoverflow.com/questions/18097107/python-gzip-appending-to-file-on-the-fly

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!