Downloading and unzipping a .zip file without writing to disk

前端未结

关注

 9  1583

囚心锁ツ 2020-12-02 04:42

I have managed to get my first python script to work which downloads a list of .ZIP files from a URL and then proceeds to extract the ZIP files and writes them to disk.

9条回答

盖世英雄少女心 (楼主)

2020-12-02 05:14
I'd like to offer an updated Python 3 version of Vishal's excellent answer, which was using Python 2, along with some explanation of the adaptations / changes, which may have been already mentioned.
```
from io import BytesIO
from zipfile import ZipFile
import urllib.request
    
url = urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/loc162txt.zip")

with ZipFile(BytesIO(url.read())) as my_zip_file:
    for contained_file in my_zip_file.namelist():
        # with open(("unzipped_and_read_" + contained_file + ".file"), "wb") as output:
        for line in my_zip_file.open(contained_file).readlines():
            print(line)
            # output.write(line)
```
Necessary changes:
- There's no StringIO module in Python 3 (it's been moved to io.StringIO). Instead, I use io.BytesIO]2, because we will be handling a bytestream -- Docs, also this thread.
- urlopen:
  - "The legacy urllib.urlopen function from Python 2.6 and earlier has been discontinued; urllib.request.urlopen() corresponds to the old urllib2.urlopen.", Docs and this thread.
Note:
- In Python 3, the printed output lines will look like so: b'some text'. This is expected, as they aren't strings - remember, we're reading a bytestream. Have a look at Dan04's excellent answer.
A few minor changes I made:
- I use with ... as instead of zipfile = ... according to the Docs.
- The script now uses .namelist() to cycle through all the files in the zip and print their contents.
- I moved the creation of the ZipFile object into the with statement, although I'm not sure if that's better.
- I added (and commented out) an option to write the bytestream to file (per file in the zip), in response to NumenorForLife's comment; it adds "unzipped_and_read_" to the beginning of the filename and a ".file" extension (I prefer not to use ".txt" for files with bytestrings). The indenting of the code will, of course, need to be adjusted if you want to use it.
  - Need to be careful here -- because we have a byte string, we use binary mode, so "wb"; I have a feeling that writing binary opens a can of worms anyway...
- I am using an example file, the UN/LOCODE text archive:
What I didn't do:
- NumenorForLife asked about saving the zip to disk. I'm not sure what he meant by it -- downloading the zip file? That's a different task; see Oleh Prypin's excellent answer.
Here's a way:
```
import urllib.request
import shutil

with urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/2015-2_UNLOCODE_SecretariatNotes.pdf") as response, open("downloaded_file.pdf", 'w') as out_file:
    shutil.copyfileobj(response, out_file)
```
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...