How to know if urllib.urlretrieve succeeds?

后端 未结 8 1917
长情又很酷
长情又很酷 2020-11-30 01:20

urllib.urlretrieve returns silently even if the file doesn\'t exist on the remote http server, it just saves a html page to the named file. For example:

相关标签:
8条回答
  • 2020-11-30 01:49

    Results against another server/website - what comes back in "B" is a bit random, but one can test for certain values:

    A: get_good.jpg
    B: Date: Tue, 08 Mar 2016 00:44:19 GMT
    Server: Apache
    Last-Modified: Sat, 02 Jan 2016 09:17:21 GMT
    ETag: "524cf9-18afe-528565aef9ef0"
    Accept-Ranges: bytes
    Content-Length: 101118
    Connection: close
    Content-Type: image/jpeg
    
    A: get_bad.jpg
    B: Date: Tue, 08 Mar 2016 00:44:20 GMT
    Server: Apache
    Content-Length: 1363
    X-Frame-Options: deny
    Connection: close
    Content-Type: text/html
    

    In the 'bad' case (non-existing image file) "B" retrieved a small chunk of (Googlebot?) HTML code and saved it as the target, hence Content-Length of 1363 bytes.

    0 讨论(0)
  • 2020-11-30 01:51

    I keep it simple:

    # Simple downloading with progress indicator, by Cees Timmerman, 16mar12.
    
    import urllib2
    
    remote = r"http://some.big.file"
    local = r"c:\downloads\bigfile.dat"
    
    u = urllib2.urlopen(remote)
    h = u.info()
    totalSize = int(h["Content-Length"])
    
    print "Downloading %s bytes..." % totalSize,
    fp = open(local, 'wb')
    
    blockSize = 8192 #100000 # urllib.urlretrieve uses 8192
    count = 0
    while True:
        chunk = u.read(blockSize)
        if not chunk: break
        fp.write(chunk)
        count += 1
        if totalSize > 0:
            percent = int(count * blockSize * 100 / totalSize)
            if percent > 100: percent = 100
            print "%2d%%" % percent,
            if percent < 100:
                print "\b\b\b\b\b",  # Erase "NN% "
            else:
                print "Done."
    
    fp.flush()
    fp.close()
    if not totalSize:
        print
    
    0 讨论(0)
提交回复
热议问题