How to tell if a file is gzip compressed?

前端 未结 6 1372
挽巷
挽巷 2021-01-03 19:21

I have a Python program which is going to take text files as input. However, some of these files may be gzip compressed.

Is there a cross-platform, usable from Py

6条回答
  •  南笙
    南笙 (楼主)
    2021-01-03 19:56

    gzip itself will raise an OSError if it's not a gzipped file.

    >>> with gzip.open('README.md', 'rb') as f:
    ...     f.read()
    ...
    Traceback (most recent call last):
      File "", line 2, in 
      File "/Users/dennis/.asdf/installs/python/3.6.6/lib/python3.6/gzip.py", line 276, in read
        return self._buffer.read(size)
      File "/Users/dennis/.asdf/installs/python/3.6.6/lib/python3.6/gzip.py", line 463, in read
        if not self._read_gzip_header():
      File "/Users/dennis/.asdf/installs/python/3.6.6/lib/python3.6/gzip.py", line 411, in _read_gzip_header
        raise OSError('Not a gzipped file (%r)' % magic)
    OSError: Not a gzipped file (b'# ')
    

    Can combine this approach with some others to increase confidence, such as checking the mimetype or looking for a magic number in the file header (see other answers for an example) and checking the extension.

    import pathlib
    
    if '.gz' in pathlib.Path(filepath).suffixes:
       # some more inexpensive checks until confident we can attempt to decompress
       # ...
       try ...
         ...
       except OSError as e:
         ...
    

提交回复
热议问题