Python CRC-32 woes

前端 未结 2 557
既然无缘
既然无缘 2020-12-06 13:54

I\'m writing a Python program to extract data from the middle of a 6 GB bz2 file. A bzip2 file is made up of independently decryptable blocks of data, so I only need to find

2条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-06 14:38

    In addition to the one-shot decompress function, the bz2 module also contains a class BZ2Decompressor that decompresses data as it is fed to the decompress method. It therefore does not care about the end-of-file checksum and provides the data needed once it reaches the end of the block.

    To illustrate, assume I have located the block I wish to extract from the file and stored it in a bitarray.bitarray instance (other bit-twiddling modules will probably work as well). Then this function will decode it:

    def bunzip2_block(block):
        from bz2 import BZ2Decompressor
        from bitarray import bitarray
    
        dummy_file = bitarray(endian="big")
        dummy_file.frombytes("BZh9")
        dummy_file += block
    
        decompressor = BZ2Decompressor()
        return decompressor.decompress(dummy_file.tobytes())
    

    Note that the frombytes and tobytes methods of bitarray were previously called fromstring and tostring.

提交回复
热议问题