Python 2.7: Compressing data with the XZ format using the “lzma” module

前端 未结 2 1446
南方客
南方客 2021-01-12 20:37

I\'m experimenting with the lzma module in Python 2.7.6 to see if I could create compressed files using the XZ format for a future project that will make use of it. My code

2条回答
  •  不要未来只要你来
    2021-01-12 21:07

    I would not be concerned about the differences in the compressed files - depending on the container format and the checksum type used in the .xz file, the compressed data could vary without affecting the contents.

    EDIT I've been looking into this further, and wrote this script to test the PyLZMA Python2.x module and the lzma Python3.x built in module

    from __future__ import print_function
    try:
        import lzma as xz
    except ImportError:
        import pylzma as xz
    import os
    
    # compress with xz command line util
    os.system('xz -zkf test.txt')
    
    # now compress with lib
    with open('test.txt', 'rb') as f, open('test.txt.xzpy', 'wb') as out:
        out.write(xz.compress(bytes(f.read())))
    
    # compare the two files
    from hashlib import md5
    
    with open('test.txt.xz', 'rb') as f1, open('test.txt.xzpy', 'rb') as f2:
        hash1 = md5(f1.read()).hexdigest()
        hash2 = md5(f2.read()).hexdigest() 
        print(hash1, hash2)
        assert hash1 == hash2
    

    This compresses a file test.txt with the xz command line utility and with the Python module and compares the results. Under Python3 lzma produces the same result as xz, however under Python2 PyLZMA produces a different result that cannot be extracted using the xz command line util.

    What module are you using that is called "lzma" in Python2 and what command did you use to compress the data?

    EDIT 2 Okay, I found the pyliblzma module for Python2. However it seems to use CRC32 as the default checksum algorithm (others use CRC64) and there is a bug that prevents changing the checksum algorithm https://bugs.launchpad.net/pyliblzma/+bug/1243344

    You could possibly try compressing using xz -C crc32 to compare the results, but I'm still not having success making a valid compressed file using the Python2 libraries.

提交回复
热议问题