How to know if urllib.urlretrieve succeeds?

后端未结

关注

 8  1917

urllib.urlretrieve returns silently even if the file doesn\'t exist on the remote http server, it just saves a html page to the named file. For example:

相关标签:

8条回答

臣服心动

2020-11-30 01:49

Results against another server/website - what comes back in "B" is a bit random, but one can test for certain values:

A: get_good.jpg
B: Date: Tue, 08 Mar 2016 00:44:19 GMT
Server: Apache
Last-Modified: Sat, 02 Jan 2016 09:17:21 GMT
ETag: "524cf9-18afe-528565aef9ef0"
Accept-Ranges: bytes
Content-Length: 101118
Connection: close
Content-Type: image/jpeg

A: get_bad.jpg
B: Date: Tue, 08 Mar 2016 00:44:20 GMT
Server: Apache
Content-Length: 1363
X-Frame-Options: deny
Connection: close
Content-Type: text/html

In the 'bad' case (non-existing image file) "B" retrieved a small chunk of (Googlebot?) HTML code and saved it as the target, hence Content-Length of 1363 bytes.

0 讨论(0)

后悔当初

2020-11-30 01:51

I keep it simple:

# Simple downloading with progress indicator, by Cees Timmerman, 16mar12.

import urllib2

remote = r"http://some.big.file"
local = r"c:\downloads\bigfile.dat"

u = urllib2.urlopen(remote)
h = u.info()
totalSize = int(h["Content-Length"])

print "Downloading %s bytes..." % totalSize,
fp = open(local, 'wb')

blockSize = 8192 #100000 # urllib.urlretrieve uses 8192
count = 0
while True:
    chunk = u.read(blockSize)
    if not chunk: break
    fp.write(chunk)
    count += 1
    if totalSize > 0:
        percent = int(count * blockSize * 100 / totalSize)
        if percent > 100: percent = 100
        print "%2d%%" % percent,
        if percent < 100:
            print "\b\b\b\b\b",  # Erase "NN% "
        else:
            print "Done."

fp.flush()
fp.close()
if not totalSize:
    print

0 讨论(0)

上一页 1 2