Create a zip file from a generator in Python?

后端未结

关注

 10  1894

佛祖请我去吃肉 2020-11-30 07:32

I\'ve got a large amount of data (a couple gigs) I need to write to a zip file in Python. I can\'t load it all into memory at once to pass to the .writestr method of ZipFil

10条回答

爱一瞬间的悲伤 (楼主)

2020-11-30 08:34
Changed in Python 3.5 (from official docs): Added support for writing to unseekable streams.

This means that now for zipfile.ZipFile we can use streams which do not store the entire file in memory. Such streams do not support movement over the entire data volume.

So this is simple generator:
```
from zipfile import ZipFile, ZipInfo

def zipfile_generator(path, stream):
    with ZipFile(stream, mode='w') as zf:
        z_info = ZipInfo.from_file(path)
        with open(path, 'rb') as entry, zf.open(z_info, mode='w') as dest:
            for chunk in iter(lambda: entry.read(16384), b''):
                dest.write(chunk)
                # Yield chunk of the zip file stream in bytes.
                yield stream.get()
    # ZipFile was closed.
    yield stream.get()
```
path is a string path of the large file or directory or pathlike object.

stream is the unseekable stream instance of the class like this (designed according to official docs):
```
from io import RawIOBase

class UnseekableStream(RawIOBase):
    def __init__(self):
        self._buffer = b''

    def writable(self):
        return True

    def write(self, b):
        if self.closed:
            raise ValueError('Stream was closed!')
        self._buffer += b
        return len(b)

    def get(self):
        chunk = self._buffer
        self._buffer = b''
        return chunk
```
You can try this code online: https://repl.it/@IvanErgunov/zipfilegenerator

There is also another way to create a generator without ZipInfo and manually reading and dividing your large file. You can pass the queue.Queue() object to your UnseekableStream() object and write to this queue in another thread. Then in current thread you can simply read chunks from this queue in iterable way. See docs

P.S. Python Zipstream by allanlei is outdated and unreliable way. It was an attempt to add support for unseekable streams before it was done officially.
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...