How do you get default headers in a urllib2 Request?

后端 未结 8 1159
花落未央
花落未央 2020-12-13 07:22

I have a Python web client that uses urllib2. It is easy enough to add HTTP headers to my outgoing requests. I just create a dictionary of the headers I want to add, and pa

8条回答
  •  孤街浪徒
    2020-12-13 08:04

    The urllib2 library uses OpenerDirector objects to handle the actual opening. Fortunately, the python library provides defaults so you don't have to. It is, however, these OpenerDirector objects that are adding the extra headers.

    To see what they are after the request has been sent (so that you can log it, for example):

    req = urllib2.Request(url='http://google.com')
    response = urllib2.urlopen(req)
    print req.unredirected_hdrs
    
    (produces {'Host': 'google.com', 'User-agent': 'Python-urllib/2.5'} etc)
    

    The unredirected_hdrs is where the OpenerDirectors dump their extra headers. Simply looking at req.headers will show only your own headers - the library leaves those unmolested for you.

    If you need to see the headers before you send the request, you'll need to subclass the OpenerDirector in order to intercept the transmission.

    Hope that helps.

    EDIT: I forgot to mention that, once the request as been sent, req.header_items() will give you a list of tuples of ALL the headers, with both your own and the ones added by the OpenerDirector. I should have mentioned this first since it's the most straightforward :-) Sorry.

    EDIT 2: After your question about an example for defining your own handler, here's the sample I came up with. The concern in any monkeying with the request chain is that we need to be sure that the handler is safe for multiple requests, which is why I'm uncomfortable just replacing the definition of putheader on the HTTPConnection class directly.

    Sadly, because the internals of HTTPConnection and the AbstractHTTPHandler are very internal, we have to reproduce much of the code from the python library to inject our custom behaviour. Assuming I've not goofed below and this works as well as it did in my 5 minutes of testing, please be careful to revisit this override if you update your Python version to a revision number (ie: 2.5.x to 2.5.y or 2.5 to 2.6, etc).

    I should therefore mention that I am on Python 2.5.1. If you have 2.6 or, particularly, 3.0, you may need to adjust this accordingly.

    Please let me know if this doesn't work. I'm having waaaayyyy too much fun with this question:

    import urllib2
    import httplib
    import socket
    
    
    class CustomHTTPConnection(httplib.HTTPConnection):
    
        def __init__(self, *args, **kwargs):
            httplib.HTTPConnection.__init__(self, *args, **kwargs)
            self.stored_headers = []
    
        def putheader(self, header, value):
            self.stored_headers.append((header, value))
            httplib.HTTPConnection.putheader(self, header, value)
    
    
    class HTTPCaptureHeaderHandler(urllib2.AbstractHTTPHandler):
    
        def http_open(self, req):
            return self.do_open(CustomHTTPConnection, req)
    
        http_request = urllib2.AbstractHTTPHandler.do_request_
    
        def do_open(self, http_class, req):
            # All code here lifted directly from the python library
            host = req.get_host()
            if not host:
                raise URLError('no host given')
    
            h = http_class(host) # will parse host:port
            h.set_debuglevel(self._debuglevel)
    
            headers = dict(req.headers)
            headers.update(req.unredirected_hdrs)
            headers["Connection"] = "close"
            headers = dict(
                (name.title(), val) for name, val in headers.items())
            try:
                h.request(req.get_method(), req.get_selector(), req.data, headers)
                r = h.getresponse()
            except socket.error, err: # XXX what error?
                raise urllib2.URLError(err)
            r.recv = r.read
            fp = socket._fileobject(r, close=True)
    
            resp = urllib2.addinfourl(fp, r.msg, req.get_full_url())
            resp.code = r.status
            resp.msg = r.reason
    
            # This is the line we're adding
            req.all_sent_headers = h.stored_headers
            return resp
    
    my_handler = HTTPCaptureHeaderHandler()
    opener = urllib2.OpenerDirector()
    opener.add_handler(my_handler)
    req = urllib2.Request(url='http://www.google.com')
    
    resp = opener.open(req)
    
    print req.all_sent_headers
    
    shows: [('Accept-Encoding', 'identity'), ('Host', 'www.google.com'), ('Connection', 'close'), ('User-Agent', 'Python-urllib/2.5')]
    

提交回复
热议问题