If I open a file using urllib2, like so:
remotefile = urllib2.urlopen(\'http://example.com/somefile.zip\')
Is there an easy way to get the
import os,urllib2
resp = urllib2.urlopen('http://www.example.com/index.html')
my_url = resp.geturl()
os.path.split(my_url)[1]
# 'index.html'
This is not openfile, but maybe still helps :)
The os.path.basename
function works not only for file paths, but also for urls, so you don't have to manually parse the URL yourself. Also, it's important to note that you should use result.url
instead of the original url in order to follow redirect responses:
import os
import urllib2
result = urllib2.urlopen(url)
real_url = urllib2.urlparse.urlparse(result.url)
filename = os.path.basename(real_url.path)
not that I know of.
but you can parse it easy enough like this:
url = 'http://example.com/somefile.zip'
print url.split('/')[-1]
Did you mean urllib2.urlopen?
You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking remotefile.info()['Content-Disposition']
, but as it is I think you'll just have to parse the url.
You could use urlparse.urlsplit
, but if you have any URLs like at the second example, you'll end up having to pull the file name out yourself anyway:
>>> urlparse.urlsplit('http://example.com/somefile.zip')
('http', 'example.com', '/somefile.zip', '', '')
>>> urlparse.urlsplit('http://example.com/somedir/somefile.zip')
('http', 'example.com', '/somedir/somefile.zip', '', '')
Might as well just do this:
>>> 'http://example.com/somefile.zip'.split('/')[-1]
'somefile.zip'
>>> 'http://example.com/somedir/somefile.zip'.split('/')[-1]
'somefile.zip'
If you only want the file name itself, assuming that there's no query variables at the end like http://example.com/somedir/somefile.zip?foo=bar then you can use os.path.basename for this:
[user@host]$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.basename("http://example.com/somefile.zip")
'somefile.zip'
>>> os.path.basename("http://example.com/somedir/somefile.zip")
'somefile.zip'
>>> os.path.basename("http://example.com/somedir/somefile.zip?foo=bar")
'somefile.zip?foo=bar'
Some other posters mentioned using urlparse, which will work, but you'd still need to strip the leading directory from the file name. If you use os.path.basename() then you don't have to worry about that, since it returns only the final part of the URL or file path.
You could also combine both of the two best-rated answers : Using urllib2.urlparse.urlsplit() to get the path part of the URL, and then os.path.basename for the actual file name.
Full code would be :
>>> remotefile=urllib2.urlopen(url)
>>> try:
>>> filename=remotefile.info()['Content-Disposition']
>>> except KeyError:
>>> filename=os.path.basename(urllib2.urlparse.urlsplit(url).path)