I\'m writing a Python program to extract data from the middle of a 6 GB bz2 file. A bzip2 file is made up of independently decryptable blocks of data, so I only need to find
In addition to the one-shot decompress
function, the bz2 module also contains a class BZ2Decompressor
that decompresses data as it is fed to the decompress method. It therefore does not care about the end-of-file checksum and provides the data needed once it reaches the end of the block.
To illustrate, assume I have located the block I wish to extract from the file and stored it in a bitarray.bitarray instance (other bit-twiddling modules will probably work as well). Then this function will decode it:
def bunzip2_block(block):
from bz2 import BZ2Decompressor
from bitarray import bitarray
dummy_file = bitarray(endian="big")
dummy_file.frombytes("BZh9")
dummy_file += block
decompressor = BZ2Decompressor()
return decompressor.decompress(dummy_file.tobytes())
Note that the frombytes
and tobytes
methods of bitarray were previously called fromstring
and tostring
.