How to check empty gzip file in Python

后端 未结 8 939
太阳男子
太阳男子 2021-01-12 01:46

I don\'t want to use OS commands as that makes it is OS dependent.

This is available in tarfile, tarfile.is_tarfile(filename), to check if

8条回答
  •  醉话见心
    2021-01-12 02:11

    Unfortunately, the gzip module does not expose any functionality equivalent to the -l list option of the gzip program. But in Python 3 you can easily get the size of the uncompressed data by calling the .seek method with a whence argument of 2, which signifies positioning relative to the end of the (uncompressed) data stream.

    .seek returns the new byte position, so .seek(0, 2) returns the byte offset of the end of the uncompressed file, i.e., the file size. Thus if the uncompressed file is empty the .seek call will return 0.

    import gzip
    
    def gz_size(fname):
        with gzip.open(fname, 'rb') as f:
            return f.seek(0, whence=2)
    

    Here's a function that will work on Python 2, tested on Python 2.6.6.

    def gz_size(fname):
        f = gzip.open(fname, 'rb')
        data = f.read()
        f.close()
        return len(data)
    

    You can read about .seek and other methods of the GzipFile class using the pydoc program. Just run pydoc gzip in the shell.


    Alternatively, if you wish to avoid decompressing the file you can (sort of) read the uncompressed data size directly from the .gz file. The size is stored in the last 4 bytes of the file as a little-endian unsigned long, so it's actually the size modulo 2**32, therefore it will not be the true size if the uncompressed data size is >= 4GB.

    This code works on both Python 2 and Python 3.

    import gzip
    import struct
    
    def gz_size(fname):
        with open(fname, 'rb') as f:
            f.seek(-4, 2)
            data = f.read(4)
        size = struct.unpack('

    However, this method is not reliable, as Mark Adler (gzip co-author) mentions in the comments:

    There are other reasons that the length at the end of the gzip file would not represent the length of the uncompressed data. (Concatenated gzip streams, padding at the end of the gzip file.) It should not be used for this purpose. It's only there as an integrity check on the data.


    Here is another solution. It does not decompress the whole file. It returns True if the uncompressed data in the input file is of zero length, but it also returns True if the input file itself is of zero length. If the input file is not of zero length and is not a gzip file then OSError is raised.

    import gzip
    
    def gz_is_empty(fname):
        ''' Test if gzip file fname is empty
            Return True if the uncompressed data in fname has zero length
            or if fname itself has zero length
            Raises OSError if fname has non-zero length and is not a gzip file
        '''
        with gzip.open(fname, 'rb') as f:
            data = f.read(1)
        return len(data) == 0
    

提交回复
热议问题