How to get the code of the headers through urllib?
The getcode() method (Added in python2.6) returns the HTTP status code that was sent with the response, or None if the URL is no HTTP URL.
>>> a=urllib.urlopen('http://www.google.com/asdfsf')
>>> a.getcode()
404
>>> a=urllib.urlopen('http://www.google.com/')
>>> a.getcode()
200
Joe Holloway
You can use urllib2 as well:
import urllib2
req = urllib2.Request('http://www.python.org/fish.html')
try:
resp = urllib2.urlopen(req)
except urllib2.HTTPError as e:
if e.code == 404:
# do something...
else:
# ...
except urllib2.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
else:
# 200
body = resp.read()
Note that HTTPError
is a subclass of URLError
which stores the HTTP status code.
XavierCLL
For Python 3:
import urllib.request, urllib.error
url = 'http://www.google.com/asdfsf'
try:
conn = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
# Return code error (e.g. 404, 501, ...)
# ...
print('HTTPError: {}'.format(e.code))
except urllib.error.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
print('URLError: {}'.format(e.reason))
else:
# 200
# ...
print('good')
import urllib2
try:
fileHandle = urllib2.urlopen('http://www.python.org/fish.html')
data = fileHandle.read()
fileHandle.close()
except urllib2.URLError, e:
print 'you got an error with the code', e
来源:https://stackoverflow.com/questions/1726402/in-python-how-do-i-use-urllib-to-see-if-a-website-is-404-or-200