I am downloading a file using Python urllib2. How do I check how large the file size is?

时光毁灭记忆、已成空白 提交于 2019-11-30 09:07:06
Andrew Dalke

There's no need as bobince did and drop to httplib. You can do all that with urllib directly:

>>> import urllib2
>>> f = urllib2.urlopen("http://dalkescientific.com")
>>> f.headers.items()
[('content-length', '7535'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.14'),
 ('last-modified', 'Sun, 09 Mar 2008 00:27:43 GMT'), ('connection', 'close'),
 ('etag', '"19fa87-1d6f-447f627da7dc0"'), ('date', 'Wed, 28 Oct 2009 19:59:10 GMT'),
 ('content-type', 'text/html')]
>>> f.headers["Content-Length"]
'7535'
>>> 

If you use httplib then you may have to implement redirect handling, proxy support, and the other nice things that urllib2 does for you.

You could say:

maxlength= 12*1024*1024
thefile= urllib2.urlopen(request).read(maxlength+1)
if len(thefile)==maxlength+1:
    raise ThrowToysOutOfPramException()

but then of course you've still read 12MB of unwanted data. If you want to minimise the risk of this happening you can check the HTTP Content-Length header, if present (it might not be). But to do that you need to drop down to httplib instead of the more general urllib.

u= urlparse.urlparse(ep_url)
cn= httplib.HTTPConnection(u.netloc)
cn.request('GET', u.path, headers= {'User-Agent': ua})
r= cn.getresponse()

try:
    l= int(r.getheader('Content-Length', '0'))
except ValueError:
    l= 0
if l>maxlength:
    raise IAmCrossException()

thefile= r.read(maxlength+1)
if len(thefile)==maxlength+1:
    raise IAmStillCrossException()

You can check the length before asking to get the file too, if you prefer. This is basically the same as above, except using the method 'HEAD' instead of 'GET'.

SeriousCallersOnly

you can check the content-length in a HEAD request first, but be warned, this header doesn't have to be set - see How do you send a HEAD HTTP request in Python 2?

This will work if the Content-Length header is set

import urllib2          
req = urllib2.urlopen("http://example.com/file.zip")
total_size = int(req.info().getheader('Content-Length'))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!