How to get pdf filename with Python requests?

后端 未结 5 732
渐次进展
渐次进展 2020-12-09 08:16

I\'m using the Python requests lib to get a PDF file from the web. This works fine, but I now also want the original filename. If I go to a PDF file in Firefox and click

5条回答
  •  伪装坚强ぢ
    2020-12-09 08:34

    Building on some of the other answers, here's how I do it. If there isn't a Content-Disposition header, I parse it from the download URL:

    import re
    import requests
    from requests.exceptions import RequestException
    
    
    url = 'http://www.example.com/downloads/sample.pdf'
    
    try:
        with requests.get(url) as r:
    
            fname = ''
            if "Content-Disposition" in r.headers.keys():
                fname = re.findall("filename=(.+)", r.headers["Content-Disposition"])[0]
            else:
                fname = url.split("/")[-1]
    
            print(fname)
    except RequestException as e:
        print(e)
    

    There are arguably better ways of parsing the URL string, but for simplicity I didn't want to involve any more libraries.

提交回复
热议问题