How to stream a gzip built on the fly in Python?

て烟熏妆下的殇ゞ 提交于 2021-02-08 09:51:15

问题


I'd like to stream a big log file over the network using asyncio. I retrieve the data from the database, format it, compress it using python's zlib and stream it over the network.

Here is basically the code I use:

@asyncio.coroutine
def logs(requests):
    # ...

    yield from resp.prepare(request)

    # gzip magic number and compression format
    resp.write(b'\x1f\x8b\x08\x00\x00\x00\x00\x00')
    compressor = compressobj()
    for row in rows:
        ip, uid, date, url, answer, volume = row
        NCSA_ROW = '{} {} - [{}] "GET {} HTTP/1.0" {} {}\n'
        row = NCSA_ROW.format(ip, uid, date, url, answer, volume)
        row = row.encode('utf-8')
        data = compressor.compress(row)
        resp.write(data)
    resp.write(compressor.flush())
    return resp

The file that I retrieve can not be opened with gunzip and zcat raise the following error:

gzip: out.gz: unexpected end of file

回答1:


Your gzip header is wrong (8 bytes instead of 10), and you follow it with a zlib stream which uses a different header and trailer. Even had you had a correct gzip header, and if you had a raw deflate stream instead of a gzip stream, you would still have not written a gzip trailer.

To do this right, do not attempt to write your own gzip header. Instead request that zlib write a complete gzip stream, which will write the correct header, compressed data, and trailer. You can do this by providing a wbits value of 31 to compressobj().



来源:https://stackoverflow.com/questions/37944801/how-to-stream-a-gzip-built-on-the-fly-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!