urllib2 file name

前端未结

关注

 14  1794

If I open a file using urllib2, like so:

remotefile = urllib2.urlopen(\'http://example.com/somefile.zip\')

Is there an easy way to get the

相关标签:

14条回答

悲&欢浪女

2020-12-01 04:04

I think that "the file name" isn't a very well defined concept when it comes to http transfers. The server might (but is not required to) provide one as "content-disposition" header, you can try to get that with remotefile.headers['Content-Disposition']. If this fails, you probably have to parse the URI yourself.

0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2020-12-01 04:04
I guess it depends what you mean by parsing. There is no way to get the filename without parsing the URL, i.e. the remote server doesn't give you a filename. However, you don't have to do much yourself, there's the urlparse module:
```
In [9]: urlparse.urlparse('http://example.com/somefile.zip')
Out[9]: ('http', 'example.com', '/somefile.zip', '', '', '')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

名媛妹妹

2020-12-01 04:04

using requests, but you can do it easy with urllib(2)

import requests
from urllib import unquote
from urlparse import urlparse

sample = requests.get(url)

if sample.status_code == 200:
    #has_key not work here, and this help avoid problem with names

    if filename == False:

        if 'content-disposition' in sample.headers.keys():
            filename = sample.headers['content-disposition'].split('filename=')[-1].replace('"','').replace(';','')

        else:

            filename = urlparse(sample.url).query.split('/')[-1].split('=')[-1].split('&')[-1]

            if not filename:

                if url.split('/')[-1] != '':
                    filename = sample.url.split('/')[-1].split('=')[-1].split('&')[-1]
                    filename = unquote(filename)

0 讨论(0)

庸人自扰

2020-12-01 04:11
Do you mean urllib2.urlopen? There is no function called openfile in the urllib2 module.

Anyway, use the urllib2.urlparse functions:
```
>>> from urllib2 import urlparse
>>> print urlparse.urlsplit('http://example.com/somefile.zip')
('http', 'example.com', '/somefile.zip', '', '')
```
Voila.
0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2020-12-01 04:11
Using PurePosixPath which is not operating system—dependent and handles urls gracefully is the pythonic solution:
```
>>> from pathlib import PurePosixPath
>>> path = PurePosixPath('http://example.com/somefile.zip')
>>> path.name
'somefile.zip'
>>> path = PurePosixPath('http://example.com/nested/somefile.zip')
>>> path.name
'somefile.zip'
```
Notice how there is no network traffic here or anything (i.e. those urls don't go anywhere) - just using standard parsing rules.
0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2020-12-01 04:13
Using urlsplit is the safest option:
```
url = 'http://example.com/somefile.zip'
urlparse.urlsplit(url).path.split('/')[-1]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页