urllib2 returns 404 for a website which displays fine in browsers

我的梦境 提交于 2019-11-29 04:13:40
MattH

I just tried this and received 404 code and page back.

At a guess it's doing User-Agent detection which either by accident or on purpose doesn't serve content to python urllib.

Clarification, with urllib, I received the urlopen returned a response object with a 404 code and HTML content. With urllib2.urlopen an urllib2.HTTPError exception was raised.

I'd suggest you try setting your User Agent to something that looks like a browser. There's a question about this here: Changing user agent on urllib2.urlopen

You can use try except to capture an Error

try:
    u = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.code
    print e.msg
    return

hm... are you sure that URL is valid? try "http://www.google.com" I had similar code and there is no problems with urllib. Or you can use try - except statement to see error's details. And of course MattH's answer is very similar to the truth :)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!