可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I wanted to check if a certain website exists, this is what I'm doing:

user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Agent':user_agent } link = "http://www.abc.com" req = urllib2.Request(link, headers = headers) page = urllib2.urlopen(req).read() - ERROR 402 generated here!

If the page doesn't exist (error 402, or whatever other errors), what can I do in the page = ... line to make sure that the page I'm reading does exit?

回答1:

You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status from the headers.

import httplib c = httplib.HTTPConnection('www.example.com') c.request("HEAD", '') if c.getresponse().status == 200:    print('web site exists')

or you can use urllib2

import urllib2 try:     urllib2.urlopen('http://www.example.com/some_page') except urllib2.HTTPError, e:     print(e.code) except urllib2.URLError, e:     print(e.args)

or you can use requests

import requests request = requests.get('http://www.example.com') if request.status_code == 200:     print('Web site exists') else:     print('Web site does not exist')

回答2:

It's better to check that status code is here. Here is what do status codes mean (taken from wikipedia):

1xx - informational
2xx - success
3xx - redirection
4xx - client error
5xx - server error

If you want to check if page exists and don't want to download the whole page, you should use Head Request:

import httplib2 h = httplib2.Http() resp = h.request("http://www.google.com", 'HEAD') assert int(resp[0]['status'])

taken from this answer.

If you want to download the whole page, just make a normal request and check the status code. Example using requests:

import requests  response = requests.get('http://google.com') assert response.status_code

回答3:

from urllib2 import Request, urlopen, HTTPError, URLError  user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Agent':user_agent } link = "http://www.abc.com/" req = Request(link, headers = headers) try:         page_open = urlopen(req) except HTTPError, e:         print e.code except URLError, e:         print e.reason else:         print 'ok'

To answer the comment of unutbu:

Because the default handlers handle redirects (codes in the 300 range), and codes in the 100-299 range indicate success, you will usually only see error codes in the 400-599 range. Source

回答4:

code:

a="http://www.example.com" try:         print urllib.urlopen(a) except:     print a+"  site does not exist"

回答5:

def isok(mypath):     try:         thepage = urllib.request.urlopen(mypath)     except HTTPError as e:         return 0     except URLError as e:         return 0     else:         return 1

文章来源: Python check if website exists

标签

exists

python