Python gzip refuses to read uncompressed file

前端 未结 4 1698
一整个雨季
一整个雨季 2021-01-17 13:31

I seem to remember that the Python gzip module previously allowed you to read non-gzipped files transparently. This was really useful, as it allowed to read an input file wh

4条回答
  •  一个人的身影
    2021-01-17 14:25

    The best solution for this would be to use something like https://github.com/ahupp/python-magic with libmagic. You simply cannot avoid at least reading a header to identify a file (unless you implicitly trust file extensions)

    If you're feeling spartan the magic number for identifying gzip(1) files is the first two bytes being 0x1f 0x8b.

    In [1]: f = open('foo.html.gz')
    In [2]: print `f.read(2)`
    '\x1f\x8b'
    

    gzip.open is just a wrapper around GzipFile, you could have a function like this that just returns the correct type of object depending on what the source is without having to open the file twice:

    #!/usr/bin/python
    
    import gzip
    
    def opener(filename):
        f = open(filename,'rb')
        if (f.read(2) == '\x1f\x8b'):
            f.seek(0)
            return gzip.GzipFile(fileobj=f)
        else:
            f.seek(0)
            return f
    

提交回复
热议问题