Random access gzip stream

后端 未结 4 1420
余生分开走
余生分开走 2020-12-15 08:32

I\'d like to be able to do random access into a gzipped file. I can afford to do some preprocessing on it (say, build some kind of index), provided that the result of the pr

4条回答
  •  鱼传尺愫
    2020-12-15 09:03

    interesting question. I don't understand why your 2nd option (recompress file in chunks) would double the disk space. Seems to me it would be the same, less a small amount of overhead. If you have control over the compression piece, then that seems like the right idea.

    Maybe what you mean is that you don't have control over the input, and therefore it would double.

    If you can do it, I'm imagining modelling it as a CompressedFileStream class that uses as its backing store, a series of 1mb gzip'd blobs. When reading, a Seek() on the stream would move to the appropriate blob and decompress. A Read() past the end of a blob would cause the stream to open the next blob.

    ps: GZIP is described in IETF RFC 1952, but it uses DEFLATE for the compression format. There'd be no reason to use the GZIP elaboration if you implemented this CompressedFileStream class as I've imagined it.

提交回复
热议问题