python get headers only using urllib2

邮差的信 提交于 2019-12-01 13:25:26

Firstly, your code contains several bugs:

  1. On each request of getheadersonly you install a new global urlopener which is then used in subsequent calls of urllib2.urlopen

  2. You make two HTTP-requests to get two different attributes of a response.

  3. The implementation of urllib2.HTTPRedirectHandler.http_error_302 is not so trivial and I do not understand how can it prevent redirections in the first place.

Basically, you should understand that each handler is installed in an opener to handle certain kind of response. urllib2.HTTPRedirectHandler is there to convert certain http-codes into a redirections. If you do not want redirections, do not add a redirection handler into the opener. If you do not want to open ftp links, do not add FTPHandler, etc.

That is all you need is to create a new opener and add the urllib2.HTTPHandler() in it, customize the request to be 'HEAD' request and pass an instance of the request to the opener, read the attributes, and close the response.

class HeadRequest(urllib2.Request):
    def get_method(self):
        return 'HEAD'

def getheadersonly(url, redirections=True):
    opener = urllib2.OpenerDirector()
    opener.add_handler(urllib2.HTTPHandler())
    opener.add_handler(urllib2.HTTPDefaultErrorHandler())
    if redirections:
        # HTTPErrorProcessor makes HTTPRedirectHandler work
        opener.add_handler(urllib2.HTTPErrorProcessor())
        opener.add_handler(urllib2.HTTPRedirectHandler())
    try:
        res = opener.open(HeadRequest(url))
    except urllib2.HTTPError, res:
        pass
    res.close()
    return dict(code=res.code, headers=res.info(), finalurl=res.geturl())
bigblind

You can send a HEAD request using httplib. A HEAD request is the same as a GET request, but the server doesn't send then message body.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!