Why I get urllib2.HTTPError with urllib2 and no errors with urllib?

匿名 (未验证) 提交于 2019-12-03 03:00:02

问题:

I have the following simple code:

import urllib2 import sys sys.path.append('../BeautifulSoup/BeautifulSoup-3.1.0.1') from BeautifulSoup import * page='http://en.wikipedia.org/wiki/Main_Page' c=urllib2.urlopen(page) 

This code generates the following error messages:

    c=urllib2.urlopen(page)   File "/usr/lib64/python2.4/urllib2.py", line 130, in urlopen     return _opener.open(url, data)   File "/usr/lib64/python2.4/urllib2.py", line 364, in open     response = meth(req, response)   File "/usr/lib64/python2.4/urllib2.py", line 471, in http_response     response = self.parent.error(   File "/usr/lib64/python2.4/urllib2.py", line 402, in error     return self._call_chain(*args)   File "/usr/lib64/python2.4/urllib2.py", line 337, in _call_chain     result = func(*args)   File "/usr/lib64/python2.4/urllib2.py", line 480, in http_error_default     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden 

But if I replace urllib2 by urllib, I get no error messages. Can anybody explain this behavior?

回答1:

The original urllib simply does not raise an exception on a 403 code. If you add print c.getcode() to the last line of your program, urllib will reach it and still print out 403.

Then if you do print c.read() at the end, you will see that you did indeed get an error page from Wikipedia. It's just a matter of urllib2 deciding to treat an error 403 as a runtime exception, versus urllib allowing you to still get an error 403 and then do something with the page.



回答2:

Wikipedia seems to be filtering out urllib2's default User-Agent. Just change it.



回答3:

Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway this post shows some nice way to obtain detailed error message from server



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!