Python: Creating a streaming gzip'd file-like?

前端 未结 5 1525
面向向阳花
面向向阳花 2020-12-24 07:06

I\'m trying to figure out the best way to compress a stream with Python\'s zlib.

I\'ve got a file-like input stream (input, below) and an o

5条回答
  •  猫巷女王i
    2020-12-24 07:46

    Here is a cleaner, non-self-referencing version based on Ricardo Cárdenes' very helpful answer.

    from gzip import GzipFile
    from collections import deque
    
    
    CHUNK = 16 * 1024
    
    
    class Buffer (object):
        def __init__ (self):
            self.__buf = deque()
            self.__size = 0
        def __len__ (self):
            return self.__size
        def write (self, data):
            self.__buf.append(data)
            self.__size += len(data)
        def read (self, size=-1):
            if size < 0: size = self.__size
            ret_list = []
            while size > 0 and len(self.__buf):
                s = self.__buf.popleft()
                size -= len(s)
                ret_list.append(s)
            if size < 0:
                ret_list[-1], remainder = ret_list[-1][:size], ret_list[-1][size:]
                self.__buf.appendleft(remainder)
            ret = ''.join(ret_list)
            self.__size -= len(ret)
            return ret
        def flush (self):
            pass
        def close (self):
            pass
    
    
    class GzipCompressReadStream (object):
        def __init__ (self, fileobj):
            self.__input = fileobj
            self.__buf = Buffer()
            self.__gzip = GzipFile(None, mode='wb', fileobj=self.__buf)
        def read (self, size=-1):
            while size < 0 or len(self.__buf) < size:
                s = self.__input.read(CHUNK)
                if not s:
                    self.__gzip.close()
                    break
                self.__gzip.write(s)
            return self.__buf.read(size)
    

    Advantages:

    • Avoids repeated string concatenation, which would cause the entire string to be copied repeatedly.
    • Reads a fixed CHUNK size from the input stream, instead of reading whole lines at a time (which can be arbitrarily long).
    • Avoids circular references.
    • Avoids misleading public "write" method of GzipCompressStream(), which is really only used internally.
    • Takes advantage of name mangling for internal member variables.

提交回复
热议问题