urllib2 returns 404 for a website which displays fine in browsers

前端 未结 3 845
自闭症患者
自闭症患者 2020-12-16 20:29

I am not able to open one particular url using urllib2. Same approach works well with other websites such as \"http://www.google.com\" but not this site (which also displays

相关标签:
3条回答
  • 2020-12-16 20:54

    I just tried this and received 404 code and page back.

    At a guess it's doing User-Agent detection which either by accident or on purpose doesn't serve content to python urllib.

    Clarification, with urllib, I received the urlopen returned a response object with a 404 code and HTML content. With urllib2.urlopen an urllib2.HTTPError exception was raised.

    I'd suggest you try setting your User Agent to something that looks like a browser. There's a question about this here: Changing user agent on urllib2.urlopen

    0 讨论(0)
  • 2020-12-16 20:57

    hm... are you sure that URL is valid? try "http://www.google.com" I had similar code and there is no problems with urllib. Or you can use try - except statement to see error's details. And of course MattH's answer is very similar to the truth :)

    0 讨论(0)
  • 2020-12-16 21:02

    You can use try except to capture an Error

    try:
        u = urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        print e.code
        print e.msg
        return
    
    0 讨论(0)
提交回复
热议问题