urllib2 file name

前端 未结 14 1796
感动是毒
感动是毒 2020-12-01 03:27

If I open a file using urllib2, like so:

remotefile = urllib2.urlopen(\'http://example.com/somefile.zip\')

Is there an easy way to get the

相关标签:
14条回答
  • 2020-12-01 04:13
    import os,urllib2
    resp = urllib2.urlopen('http://www.example.com/index.html')
    my_url = resp.geturl()
    
    os.path.split(my_url)[1]
    
    # 'index.html'
    

    This is not openfile, but maybe still helps :)

    0 讨论(0)
  • 2020-12-01 04:17

    The os.path.basename function works not only for file paths, but also for urls, so you don't have to manually parse the URL yourself. Also, it's important to note that you should use result.url instead of the original url in order to follow redirect responses:

    import os
    import urllib2
    result = urllib2.urlopen(url)
    real_url = urllib2.urlparse.urlparse(result.url)
    filename = os.path.basename(real_url.path)
    
    0 讨论(0)
  • 2020-12-01 04:19

    not that I know of.

    but you can parse it easy enough like this:

    url = 'http://example.com/somefile.zip'
    print url.split('/')[-1]
    

    0 讨论(0)
  • 2020-12-01 04:22

    Did you mean urllib2.urlopen?

    You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking remotefile.info()['Content-Disposition'], but as it is I think you'll just have to parse the url.

    You could use urlparse.urlsplit, but if you have any URLs like at the second example, you'll end up having to pull the file name out yourself anyway:

    >>> urlparse.urlsplit('http://example.com/somefile.zip')
    ('http', 'example.com', '/somefile.zip', '', '')
    >>> urlparse.urlsplit('http://example.com/somedir/somefile.zip')
    ('http', 'example.com', '/somedir/somefile.zip', '', '')
    

    Might as well just do this:

    >>> 'http://example.com/somefile.zip'.split('/')[-1]
    'somefile.zip'
    >>> 'http://example.com/somedir/somefile.zip'.split('/')[-1]
    'somefile.zip'
    
    0 讨论(0)
  • 2020-12-01 04:22

    If you only want the file name itself, assuming that there's no query variables at the end like http://example.com/somedir/somefile.zip?foo=bar then you can use os.path.basename for this:

    [user@host]$ python
    Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> os.path.basename("http://example.com/somefile.zip")
    'somefile.zip'
    >>> os.path.basename("http://example.com/somedir/somefile.zip")
    'somefile.zip'
    >>> os.path.basename("http://example.com/somedir/somefile.zip?foo=bar")
    'somefile.zip?foo=bar'
    

    Some other posters mentioned using urlparse, which will work, but you'd still need to strip the leading directory from the file name. If you use os.path.basename() then you don't have to worry about that, since it returns only the final part of the URL or file path.

    0 讨论(0)
  • 2020-12-01 04:29

    You could also combine both of the two best-rated answers : Using urllib2.urlparse.urlsplit() to get the path part of the URL, and then os.path.basename for the actual file name.

    Full code would be :

    >>> remotefile=urllib2.urlopen(url)
    >>> try:
    >>>   filename=remotefile.info()['Content-Disposition']
    >>> except KeyError:
    >>>   filename=os.path.basename(urllib2.urlparse.urlsplit(url).path)
    
    0 讨论(0)
提交回复
热议问题