In Python, how do I use urllib to see if a website is 404 or 200?

后端 未结 4 400
北荒
北荒 2020-12-02 07:45

How to get the code of the headers through urllib?

相关标签:
4条回答
  • 2020-12-02 08:10

    For Python 3:

    import urllib.request, urllib.error
    
    url = 'http://www.google.com/asdfsf'
    try:
        conn = urllib.request.urlopen(url)
    except urllib.error.HTTPError as e:
        # Return code error (e.g. 404, 501, ...)
        # ...
        print('HTTPError: {}'.format(e.code))
    except urllib.error.URLError as e:
        # Not an HTTP-specific error (e.g. connection refused)
        # ...
        print('URLError: {}'.format(e.reason))
    else:
        # 200
        # ...
        print('good')
    
    0 讨论(0)
  • 2020-12-02 08:20
    import urllib2
    
    try:
        fileHandle = urllib2.urlopen('http://www.python.org/fish.html')
        data = fileHandle.read()
        fileHandle.close()
    except urllib2.URLError, e:
        print 'you got an error with the code', e
    
    0 讨论(0)
  • 2020-12-02 08:31

    You can use urllib2 as well:

    import urllib2
    
    req = urllib2.Request('http://www.python.org/fish.html')
    try:
        resp = urllib2.urlopen(req)
    except urllib2.HTTPError as e:
        if e.code == 404:
            # do something...
        else:
            # ...
    except urllib2.URLError as e:
        # Not an HTTP-specific error (e.g. connection refused)
        # ...
    else:
        # 200
        body = resp.read()
    

    Note that HTTPError is a subclass of URLError which stores the HTTP status code.

    0 讨论(0)
  • 2020-12-02 08:37

    The getcode() method (Added in python2.6) returns the HTTP status code that was sent with the response, or None if the URL is no HTTP URL.

    >>> a=urllib.urlopen('http://www.google.com/asdfsf')
    >>> a.getcode()
    404
    >>> a=urllib.urlopen('http://www.google.com/')
    >>> a.getcode()
    200
    
    0 讨论(0)
提交回复
热议问题