Using urllib2 in Python. How do I get the name of the file I am downloading?

ε祈祈猫儿з 提交于 2019-12-05 08:28:37
Andreas Jung

The filename is usually included by the server through the content-disposition header:

content-disposition: attachment; filename=foo.pdf

You have access to the headers through

result = urllib2.urlopen(...)
result.info() <- contains the headers


i>>> import urllib2
ur>>> result = urllib2.urlopen('http://zopyx.com')
>>> print result
<addinfourl at 4302289808 whose fp = <socket._fileobject object at 0x1006dd5d0>>
>>> result.info()
<httplib.HTTPMessage instance at 0x1006fbab8>
>>> result.info().headers
['Date: Mon, 04 Apr 2011 02:08:28 GMT\r\n', 'Server: Zope/(unreleased version, python 2.4.6, linux2) ZServer/1.1 Plone/3.3.4\r\n', 'Content-Length: 15321\r\n', 'Content-Type: text/html; charset=utf-8\r\n', 'Via: 1.1 www.zopyx.com\r\n', 'Cache-Control: max-age=3600\r\n', 'Expires: Mon, 04 Apr 2011 03:08:28 GMT\r\n', 'Connection: close\r\n']

See

http://docs.python.org/library/urllib2.html

But be aware that this header does not need to be present. Otherwise you need to generate a reasonable name yourself from the URL requested - e.g. from the last component of the URI. Use the urlparse() method of Python in this case.

My issue with the previous answers is that they were using the original URL, and that would fail in the case of a redirect. Here's how I do it: (note the use of result.url instead of url)

import os
import urllib2
result = urllib2.urlopen(url)
filename = os.path.basename(urllib2.urlparse.urlparse(result.url).path)

You can do that using urlretrieve :

http://docs.python.org/library/urllib.html

I had an issue where server did not give me any content-disposition header so if it's also your case, you can extract filename from url like this:

os.path.basename(urlparse.urlparse(file_url))

In my case, I used file_stream.headers.subtype which contained file extension and I renamed files based on my django's model slug, here's an example:

import urlparse, os

tmp_file = NamedTemporaryFile(delete=True)
file_stream = urllib2.urlopen(file_url)
tmp_file.write(file_stream.read())
tmp_file.flush()

new_file_name = "some_prefix_" + my_model.slug + "." + file_stream.headers.subtype
#You may prefer this:
# new_file_name = os.path.basename(urlparse.urlparse(file_url))

my_model.file.save(new_file_name, File(tmp_file))

Last line is saving file using django's save method, also handling duplicated file names by adding random characters at the end :)

Awesome.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!