Why does Python's urllib2.urlopen() raise an HTTPError for successful status codes?

此生再无相见时 提交于 2019-12-03 08:01:53

问题


According to the urllib2 documentation,

Because the default handlers handle redirects (codes in the 300 range), and codes in the 100-299 range indicate success, you will usually only see error codes in the 400-599 range.

And yet the following code

request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)

raises an HTTPError with code 201 (created):

ERROR    2011-08-11 20:40:17,318 __init__.py:463] HTTP Error 201: Created

So why is urllib2 throwing HTTPErrors on this successful request?

It's not too much of a pain; I can easily extend the code to:

try:
    request = urllib2.Request(url, data, headers)
    response = urllib2.urlopen(request)
except HTTPError, e:
    if e.code == 201:
        # success! :)
    else:
        # fail! :(
else:
    # when will this happen...?

But this doesn't seem like the intended behavior, based on the documentation and the fact that I can't find similar questions about this odd behavior.

Also, what should the else block be expecting? If successful status codes are all interpreted as HTTPErrors, then when does urllib2.urlopen() just return a normal file-like response object like all the urllib2 documentation refers to?


回答1:


As the actual library documentation mentions:

For 200 error codes, the response object is returned immediately.

For non-200 error codes, this simply passes the job on to the protocol_error_code handler methods, via OpenerDirector.error(). Eventually, urllib2.HTTPDefaultErrorHandler will raise an HTTPError if no other handler handles the error.

http://docs.python.org/library/urllib2.html#httperrorprocessor-objects




回答2:


You can write a custom Handler class for use with urllib2 to prevent specific error codes from being raised as HTTError. Here's one I've used before:

class BetterHTTPErrorProcessor(urllib2.BaseHandler):
    # a substitute/supplement to urllib2.HTTPErrorProcessor
    # that doesn't raise exceptions on status codes 201,204,206
    def http_error_201(self, request, response, code, msg, hdrs):
        return response
    def http_error_204(self, request, response, code, msg, hdrs):
        return response
    def http_error_206(self, request, response, code, msg, hdrs):
        return response

Then you can use it like:

opener = urllib2.build_opener(self.BetterHTTPErrorProcessor)
urllib2.install_opener(opener)

req = urllib2.Request(url, data, headers)
urllib2.urlopen(req)



回答3:


I personally think it was a mistake and very nonintuitive for this to be the default behavior. It's true that non-2XX codes imply a protocol level error, but turning that into an exception is too far (in my opinion at least).

In any case, I think the most elegant way to avoid this is:

opener = urllib.request.build_opener()
for processor in opener.process_response['https']: # or http, depending on what you're using
   if isinstance(processor, urllib.request.HTTPErrorProcessor): # HTTPErrorProcessor also for https
       opener.process_response['https'].remove(processor)
       break # there's only one such handler by default
response = opener.open('https://www.google.com')

Now you have the response object. You can check it's status code, headers, body, etc.



来源:https://stackoverflow.com/questions/7032890/why-does-pythons-urllib2-urlopen-raise-an-httperror-for-successful-status-cod

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!