How to advance past a deflate byte sequence contained in a byte stream?

半腔热情 提交于 2021-01-27 16:15:45

问题


I have a byte stream that is a concatenation of sections, where each section is composed of a header plus a deflated byte stream.

I need to split this byte stream sections but the header only contains information about the data in uncompressed form, no hint about the compressed data length so I can advance properly in the stream and parse the next section.

So far the only way I found to advance past the deflated byte sequece is to parse it according to the this specification. From what I understood by reading the specification, a deflate stream is composed of blocks, which can be compressed blocks or literal blocks.

Literal blocks contain a size header which can be used to easily advance past it.

Compressed blocks are composed with 'prefix codes', which are bit sequences of variable length that have special meanings to the deflate algorithm. Since I'm only interested in finding out the deflated stream length, I guess the only code I need to look for is '0000000' which according to the specification signals the end of block.

So I came up with this coffeescript function to parse the deflate stream(I'm working on node.js)

# The job of this function is to return the position
# after the deflate stream contained in 'buffer'. The
# deflated stream begins at 'pos'.
advanceDeflateStream = (buffer, pos) ->
  byteOffset = 0
  finalBlock = false
  while 1
    if byteOffset == 6
      firstTypeBit = 0b00000001 & buffer[pos]
      pos++
      secondTypeBit = 0b10000000 & buffer[pos]
      type = firstTypeBit | (secondTypeBit << 1)
    else
      if byteOffset == 7
        pos++
      type = buffer[pos] & (0b01100000 >>> byteOffset)
    if type == 0
      # Literal block
      # ignore the remaining bits and advance position
      byteOffset = 0
      pos++
      len = buffer.readUInt16LE(pos)
      pos += 2
      lenComplement = buffer.readUInt16LE(pos)
      if (len ^ ~lenComplement)
        throw new Error('Literal block lengh check fail')
      pos += (2 + len) # Advance past literal block
    else if type in [1, 2]
      # huffman block
      # we are only interested in finding the 'block end' marker
      # which is signaled by the bit string 0000000 (256)
      eob = false
      matchedZeros = 0
      while !eob
        byte = buffer[pos]
        for i in [byteOffset..7]
          # loop the remaining bits looking for 7 consecutive zeros
          if (byte ^ (0b10000000 >>> byteOffset)) >>> (7 - byteOffset)
            matchedZeros++
          else
            # reset counter
            matchedZeros = 0
          if matchedZeros == 7
            eob = true
            break
          byteOffset++
        if !eob
          byteOffset = 0
          pos++
    else
      throw new Error('Invalid deflate block')
    finalBlock = buffer[pos] & (0b10000000 >>> byteOffset)
    if finalBlock
      break
  return pos

To check if this works, I wrote a simple mocha test case:

zlib = require 'zlib'

test 'sample deflate stream', (done) ->
  data = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' # length 30   
  zlib.deflate data, (err, deflated) ->
    # deflated.length == 11
    advanceDeflateStream(deflated, 0).shoudl.eql(11)
    done()

The problem is that this test fails and I do not know how to debug it. I accept any answer that points what I missed in the parsing algorithm or contains a correct version of the above function in any language.


回答1:


The only way to find the end of a deflate stream or even a deflate block is to decode all of the Huffman codes contained within. There is no bit pattern that you can search for that can not appear earlier in the stream.



来源:https://stackoverflow.com/questions/14206518/how-to-advance-past-a-deflate-byte-sequence-contained-in-a-byte-stream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!