urllib2 file name

前端 未结 14 1794
感动是毒
感动是毒 2020-12-01 03:27

If I open a file using urllib2, like so:

remotefile = urllib2.urlopen(\'http://example.com/somefile.zip\')

Is there an easy way to get the

相关标签:
14条回答
  • 2020-12-01 04:04

    I think that "the file name" isn't a very well defined concept when it comes to http transfers. The server might (but is not required to) provide one as "content-disposition" header, you can try to get that with remotefile.headers['Content-Disposition']. If this fails, you probably have to parse the URI yourself.

    0 讨论(0)
  • 2020-12-01 04:04

    I guess it depends what you mean by parsing. There is no way to get the filename without parsing the URL, i.e. the remote server doesn't give you a filename. However, you don't have to do much yourself, there's the urlparse module:

    In [9]: urlparse.urlparse('http://example.com/somefile.zip')
    Out[9]: ('http', 'example.com', '/somefile.zip', '', '', '')
    
    0 讨论(0)
  • 2020-12-01 04:04

    using requests, but you can do it easy with urllib(2)

    import requests
    from urllib import unquote
    from urlparse import urlparse
    
    sample = requests.get(url)
    
    if sample.status_code == 200:
        #has_key not work here, and this help avoid problem with names
    
        if filename == False:
    
            if 'content-disposition' in sample.headers.keys():
                filename = sample.headers['content-disposition'].split('filename=')[-1].replace('"','').replace(';','')
    
            else:
    
                filename = urlparse(sample.url).query.split('/')[-1].split('=')[-1].split('&')[-1]
    
                if not filename:
    
                    if url.split('/')[-1] != '':
                        filename = sample.url.split('/')[-1].split('=')[-1].split('&')[-1]
                        filename = unquote(filename)
    
    0 讨论(0)
  • 2020-12-01 04:11

    Do you mean urllib2.urlopen? There is no function called openfile in the urllib2 module.

    Anyway, use the urllib2.urlparse functions:

    >>> from urllib2 import urlparse
    >>> print urlparse.urlsplit('http://example.com/somefile.zip')
    ('http', 'example.com', '/somefile.zip', '', '')
    

    Voila.

    0 讨论(0)
  • 2020-12-01 04:11

    Using PurePosixPath which is not operating system—dependent and handles urls gracefully is the pythonic solution:

    >>> from pathlib import PurePosixPath
    >>> path = PurePosixPath('http://example.com/somefile.zip')
    >>> path.name
    'somefile.zip'
    >>> path = PurePosixPath('http://example.com/nested/somefile.zip')
    >>> path.name
    'somefile.zip'
    

    Notice how there is no network traffic here or anything (i.e. those urls don't go anywhere) - just using standard parsing rules.

    0 讨论(0)
  • 2020-12-01 04:13

    Using urlsplit is the safest option:

    url = 'http://example.com/somefile.zip'
    urlparse.urlsplit(url).path.split('/')[-1]
    
    0 讨论(0)
提交回复
热议问题