可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I wanted to check if a certain website exists, this is what I'm doing:
user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Agent':user_agent } link = "http://www.abc.com" req = urllib2.Request(link, headers = headers) page = urllib2.urlopen(req).read() - ERROR 402 generated here!
If the page doesn't exist (error 402, or whatever other errors), what can I do in the page = ...
line to make sure that the page I'm reading does exit?
回答1:
You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status from the headers.
import httplib c = httplib.HTTPConnection('www.example.com') c.request("HEAD", '') if c.getresponse().status == 200: print('web site exists')
or you can use urllib2
import urllib2 try: urllib2.urlopen('http://www.example.com/some_page') except urllib2.HTTPError, e: print(e.code) except urllib2.URLError, e: print(e.args)
or you can use requests
import requests request = requests.get('http://www.example.com') if request.status_code == 200: print('Web site exists') else: print('Web site does not exist')
回答2:
It's better to check that status code is here. Here is what do status codes mean (taken from wikipedia):
1xx
- informational 2xx
- success 3xx
- redirection 4xx
- client error 5xx
- server error
If you want to check if page exists and don't want to download the whole page, you should use Head Request:
import httplib2 h = httplib2.Http() resp = h.request("http://www.google.com", 'HEAD') assert int(resp[0]['status'])
taken from this answer.
If you want to download the whole page, just make a normal request and check the status code. Example using requests:
import requests response = requests.get('http://google.com') assert response.status_code
See also similar topics:
Hope that helps.
回答3:
from urllib2 import Request, urlopen, HTTPError, URLError user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Agent':user_agent } link = "http://www.abc.com/" req = Request(link, headers = headers) try: page_open = urlopen(req) except HTTPError, e: print e.code except URLError, e: print e.reason else: print 'ok'
To answer the comment of unutbu:
Because the default handlers handle redirects (codes in the 300 range), and codes in the 100-299 range indicate success, you will usually only see error codes in the 400-599 range. Source
回答4:
code:
a="http://www.example.com" try: print urllib.urlopen(a) except: print a+" site does not exist"
回答5:
def isok(mypath): try: thepage = urllib.request.urlopen(mypath) except HTTPError as e: return 0 except URLError as e: return 0 else: return 1