urllib.urlretrieve
returns silently even if the file doesn\'t exist on the remote http server, it just saves a html page to the named file. For example:
Results against another server/website - what comes back in "B" is a bit random, but one can test for certain values:
A: get_good.jpg
B: Date: Tue, 08 Mar 2016 00:44:19 GMT
Server: Apache
Last-Modified: Sat, 02 Jan 2016 09:17:21 GMT
ETag: "524cf9-18afe-528565aef9ef0"
Accept-Ranges: bytes
Content-Length: 101118
Connection: close
Content-Type: image/jpeg
A: get_bad.jpg
B: Date: Tue, 08 Mar 2016 00:44:20 GMT
Server: Apache
Content-Length: 1363
X-Frame-Options: deny
Connection: close
Content-Type: text/html
In the 'bad' case (non-existing image file) "B" retrieved a small chunk of (Googlebot?) HTML code and saved it as the target, hence Content-Length of 1363 bytes.
I keep it simple:
# Simple downloading with progress indicator, by Cees Timmerman, 16mar12.
import urllib2
remote = r"http://some.big.file"
local = r"c:\downloads\bigfile.dat"
u = urllib2.urlopen(remote)
h = u.info()
totalSize = int(h["Content-Length"])
print "Downloading %s bytes..." % totalSize,
fp = open(local, 'wb')
blockSize = 8192 #100000 # urllib.urlretrieve uses 8192
count = 0
while True:
chunk = u.read(blockSize)
if not chunk: break
fp.write(chunk)
count += 1
if totalSize > 0:
percent = int(count * blockSize * 100 / totalSize)
if percent > 100: percent = 100
print "%2d%%" % percent,
if percent < 100:
print "\b\b\b\b\b", # Erase "NN% "
else:
print "Done."
fp.flush()
fp.close()
if not totalSize:
print