How to use multiple threads for zlib compression (same input source)

拈花ヽ惹草 提交于 2019-12-10 14:09:18

问题


My goal is to compress the data of the same source in parallel threads. I have defined jobs which are in a list, these jobs have the read information(500kb-1MB in each job).

My compressor threads will compress each block's data using ZLIB and store it in the outbuf of the corresponding jobs.

Now, I want to ,merge all this and create one output file which is of standard ZLIB format.

From the ZLIB RFC and after browsing the source of pigzee, I understand that

A ZLIB header is like below

     +---+---+
     |CMF|FLG| (2 bytes)
     +---+---+
     +---+---+---+---+
     |     DICTID    | (4 bytes. Present only when FLG.FDICT is set)
     +---+---+---+---+
     +=====================+
     |...compressed data...| (variable size of data)
     +=====================+
     +---+---+---+---+
     |     ADLER32   |  (4 bytes of variable data)
     +---+---+---+---+

In my case, there is no dictionary as well.

So when I am combining two compressed units, the header of all the units is same.

Hence, I am doing the following operaions.

  1. For the first unit, I am writing the header + compressed data.

  2. For the second unit to the last unit, I am writing only the compressed data (No header and no trailer)

  3. After all the units are done, I am using adlrer32_combine()and converting the checksum's of all the jobs output data into one final adler 32 and then I am updating the output file with it at the bottom.

But the problem is that, I get an error during deflate saying the data is invalid at some places.

Have someone already tried something like this? Any relevant information will be really helpful.


回答1:


You cannot simply concatenate raw deflate data streams. Each deflate stream is self-terminating, and so decompression would end at the end of the first stream.

You need to look more carefully at the pigz code for how to merge deflate streams. You can use Z_SYNC_FLUSH to complete the last block and bring it to a byte boundary without ending the deflate stream. Then you can complete the deflate stream, and strip off the final empty block marked as the end block. Except for the last deflate stream which should terminate normally. Then you can concatenate the series of n-1 unterminated deflate streams and the last 1 terminating deflate stream.



来源:https://stackoverflow.com/questions/30794053/how-to-use-multiple-threads-for-zlib-compression-same-input-source

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!