I'm experimenting with the lzma module in Python 2.7.6 to see if I could create compressed files using the XZ format for a future project that will make use of it. My code used during the experiment was:
import lzma as xz in_file = open('/home/ki2ne/Desktop/song.wav', 'rb') input_data = in_file.read() compressed_data = xz.compress(input_data) out_file = open('/home/ki2ne/Desktop/song.wav.xz', 'wb') in_file.close() out_file.close()
and I noticed there were two different checksums (MD5 and SHA256) from the resulting file compared to when I used the plain xz (although I could decompress fine with either method - the checksums of the decompressed versions of both files were the same). Would this be a problem?
UPDATE: I found a fix for it by installing the backport (from Python 3.3) via peterjc's Git repository (link here), and now it's showing identical checksums. Not sure if it helps, but I made sure the LZMA Python module in my repository wasn't installed to avoid possible name conflicts.
Here's my test code to confirm this:
# I have created two identical text files with some random phrases from subprocess import call from hashlib import sha256 from backports import lzma as xz f2 = open("test2.txt" , 'rb') f2_buf = buffer(f2.read()) call(["xz", "test1.txt"]) f2_xzbuf = buffer(xz.compress(f2_buf)) f1 = open("test1.txt.xz", 'rb') f1_xzbuf = buffer(f1.read()) f1.close(); f2.close() f1sum = sha256(); f2sum = sha256() f1sum.update(f1_xzbuf); f2sum.update(f2_xzbuf) if f1sum.hexdigest() == f2sum.hexdigest(): print "Checksums OK" else: print "Checksum Error"
I've also verified it using the regular sha256sum as well (when I wrote the data to file).