How to un-shorten (resolve) a url using python, when final url is https?

旧时模样 提交于 2019-12-09 23:46:03

问题


I am looking to unshorten (resolve) a url in python, when the final urls are https. I have seen the question: How can I un-shorten a URL using python? (as well as similar others), however as noted in the comment to the accepted answer, this solution only works when the urls is not redirected to https.

For reference, the code in that question (which works fine when redirecting to http urls) is:

# This is for Py2k.  For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    resource = parsed.path
    if parsed.query != "":
        resource += "?" + parsed.query
    h.request('HEAD', resource )
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return unshorten_url(response.getheader('Location')) # changed to     process chains of short urls
    else:
        return url

(note - for obvious bandwidth reasons, I am looking to achieve via only asking for the file header's [i.e. like the http-only version above] and not by asking for the content of the whole pages)


回答1:


You can get the scheme from the url and then use HTTPSConnection if the parsed.scheme is https.
You can also use the requests library to do this very simply.

>>> import requests
>>> r = requests.head('http://bit.ly/IFHzvO', allow_redirects=True)
>>> print(r.url)
https://www.google.com


来源:https://stackoverflow.com/questions/29425378/how-to-un-shorten-resolve-a-url-using-python-when-final-url-is-https

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!