How to check empty gzip file in Python

后端 未结 8 955
太阳男子
太阳男子 2021-01-12 01:46

I don\'t want to use OS commands as that makes it is OS dependent.

This is available in tarfile, tarfile.is_tarfile(filename), to check if

8条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-12 02:20

    Looking through the source code for the Python 2.7 version of the gzip module, it seems to immediately return EOF, not only in the case where the gzipped file is zero bytes, but also in the case that the gzip file is zero bytes, which is arguably a bug.

    However, for your particular use-case, we can do a little better, by also confirming the gzipped file is a valid CSV file.

    This code...

    import csv
    import gzip
    
    # Returns true if the specified filename is a valid gzip'd CSV file
    # If the optional 'columns' parameter is specified, also check that
    # the first row has that many columns
    def is_valid(filename, columns=None):
    
        try:
    
            # Chain a CSV reader onto a gzip reader
            csv_file = csv.reader(gzip.open(filename))
    
            # This will try to read the first line
            # If it's not a valid gzip, this will raise IOError
            for row in csv_file:
    
                # We got at least one row
                # Bail out here if we don't care how many columns we have
                if columns is None:
                    return True
    
                # Check it has the right number of columns
                return len(row) == columns
    
            else:
    
                # There were no rows
                return False
    
        except IOError:
    
            # This is not a valid gzip file
            return False
    
    
    # Example to check whether File.txt.gz is valid
    result = is_valid('File.txt.gz')
    
    # Example to check whether File.txt.gz is valid, and has three columns
    result = is_valid('File.txt.gz', columns=3)
    

    ...should correctly handle the following error cases...

    1. The gzip file is zero bytes
    2. The gzip file is not a valid gzip file
    3. The gzipped file is zero bytes
    4. The gzipped file is not zero bytes, but contains no CSV data
    5. (Optionally) The gzipped file contains CSV data, but with the wrong number of columns

提交回复
热议问题