Downloading and unzipping a .zip file without writing to disk

前端 未结 9 1583
囚心锁ツ
囚心锁ツ 2020-12-02 04:42

I have managed to get my first python script to work which downloads a list of .ZIP files from a URL and then proceeds to extract the ZIP files and writes them to disk.

9条回答
  •  盖世英雄少女心
    2020-12-02 05:14

    I'd like to offer an updated Python 3 version of Vishal's excellent answer, which was using Python 2, along with some explanation of the adaptations / changes, which may have been already mentioned.

    from io import BytesIO
    from zipfile import ZipFile
    import urllib.request
        
    url = urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/loc162txt.zip")
    
    with ZipFile(BytesIO(url.read())) as my_zip_file:
        for contained_file in my_zip_file.namelist():
            # with open(("unzipped_and_read_" + contained_file + ".file"), "wb") as output:
            for line in my_zip_file.open(contained_file).readlines():
                print(line)
                # output.write(line)
    

    Necessary changes:

    • There's no StringIO module in Python 3 (it's been moved to io.StringIO). Instead, I use io.BytesIO]2, because we will be handling a bytestream -- Docs, also this thread.
    • urlopen:
      • "The legacy urllib.urlopen function from Python 2.6 and earlier has been discontinued; urllib.request.urlopen() corresponds to the old urllib2.urlopen.", Docs and this thread.

    Note:

    • In Python 3, the printed output lines will look like so: b'some text'. This is expected, as they aren't strings - remember, we're reading a bytestream. Have a look at Dan04's excellent answer.

    A few minor changes I made:

    • I use with ... as instead of zipfile = ... according to the Docs.
    • The script now uses .namelist() to cycle through all the files in the zip and print their contents.
    • I moved the creation of the ZipFile object into the with statement, although I'm not sure if that's better.
    • I added (and commented out) an option to write the bytestream to file (per file in the zip), in response to NumenorForLife's comment; it adds "unzipped_and_read_" to the beginning of the filename and a ".file" extension (I prefer not to use ".txt" for files with bytestrings). The indenting of the code will, of course, need to be adjusted if you want to use it.
      • Need to be careful here -- because we have a byte string, we use binary mode, so "wb"; I have a feeling that writing binary opens a can of worms anyway...
    • I am using an example file, the UN/LOCODE text archive:

    What I didn't do:

    • NumenorForLife asked about saving the zip to disk. I'm not sure what he meant by it -- downloading the zip file? That's a different task; see Oleh Prypin's excellent answer.

    Here's a way:

    import urllib.request
    import shutil
    
    with urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/2015-2_UNLOCODE_SecretariatNotes.pdf") as response, open("downloaded_file.pdf", 'w') as out_file:
        shutil.copyfileobj(response, out_file)
    

提交回复
热议问题