Download and decompress gzipped file in memory?

后端 未结 4 503
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-02 11:05

I would like to download a file using urllib and decompress the file in memory before saving.

This is what I have right now:



        
相关标签:
4条回答
  • 2020-12-02 11:40

    One line code to print the decompressed file content:

    print gzip.GzipFile(fileobj=StringIO.StringIO(urllib2.urlopen(DOWNLOAD_LINK).read()), mode='rb').read()
    
    0 讨论(0)
  • 2020-12-02 11:42

    For those using Python 3, the equivalent answer is:

    import urllib.request
    import io
    import gzip
    
    response = urllib.request.urlopen(FILE_URL)
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.GzipFile(fileobj=compressed_file)
    
    with open(OUTFILE_PATH, 'wb') as outfile:
        outfile.write(decompressed_file.read())
    
    0 讨论(0)
  • 2020-12-02 11:57

    You need to seek to the beginning of compressedFile after writing to it but before passing it to gzip.GzipFile(). Otherwise it will be read from the end by gzip module and will appear as an empty file to it. See below:

    #! /usr/bin/env python
    import urllib2
    import StringIO
    import gzip
    
    baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
    filename = "man-pages-3.34.tar.gz"
    outFilePath = "man-pages-3.34.tar"
    
    response = urllib2.urlopen(baseURL + filename)
    compressedFile = StringIO.StringIO()
    compressedFile.write(response.read())
    #
    # Set the file's current position to the beginning
    # of the file so that gzip.GzipFile can read
    # its contents from the top.
    #
    compressedFile.seek(0)
    
    decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
    
    with open(outFilePath, 'w') as outfile:
        outfile.write(decompressedFile.read())
    
    0 讨论(0)
  • 2020-12-02 11:58

    If you have Python 3.2 or above, life would be much easier:

    #!/usr/bin/env python3
    import gzip
    import urllib.request
    
    baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
    filename = "man-pages-4.03.tar.gz"
    outFilePath = filename[:-3]
    
    response = urllib.request.urlopen(baseURL + filename)
    with open(outFilePath, 'wb') as outfile:
        outfile.write(gzip.decompress(response.read()))
    

    For those who are interested in history, see https://bugs.python.org/issue3488 and https://hg.python.org/cpython/rev/3fa0a9553402.

    0 讨论(0)
提交回复
热议问题