How to get pdf filename with Python requests?

后端 未结 5 725
渐次进展
渐次进展 2020-12-09 08:16

I\'m using the Python requests lib to get a PDF file from the web. This works fine, but I now also want the original filename. If I go to a PDF file in Firefox and click

相关标签:
5条回答
  • 2020-12-09 08:20

    Apparently, for this particular resource it is in:

    r.headers['content-disposition']
    

    Don't know if it is always the case, though.

    0 讨论(0)
  • 2020-12-09 08:23

    easy python3 implementation to get filename from Content-Disposition:

    import requests
    response = requests.get(<your-url>)
    print(response.headers.get("Content-Disposition").split("filename=")[1])
    
    0 讨论(0)
  • 2020-12-09 08:26

    It is specified in an http header content-disposition. So to extract the name you would do:

    import re
    d = r.headers['content-disposition']
    fname = re.findall("filename=(.+)", d)[0]
    

    Name extracted from the string via regular expression (re module).

    0 讨论(0)
  • 2020-12-09 08:34

    Building on some of the other answers, here's how I do it. If there isn't a Content-Disposition header, I parse it from the download URL:

    import re
    import requests
    from requests.exceptions import RequestException
    
    
    url = 'http://www.example.com/downloads/sample.pdf'
    
    try:
        with requests.get(url) as r:
    
            fname = ''
            if "Content-Disposition" in r.headers.keys():
                fname = re.findall("filename=(.+)", r.headers["Content-Disposition"])[0]
            else:
                fname = url.split("/")[-1]
    
            print(fname)
    except RequestException as e:
        print(e)
    

    There are arguably better ways of parsing the URL string, but for simplicity I didn't want to involve any more libraries.

    0 讨论(0)
  • 2020-12-09 08:36

    You can use werkzeug for options headers https://werkzeug.palletsprojects.com/en/0.15.x/http/#werkzeug.http.parse_options_header

    >>> import werkzeug
    
    
    >>> werkzeug.parse_options_header('text/html; charset=utf8')
    ('text/html', {'charset': 'utf8'})
    
    0 讨论(0)
提交回复
热议问题