问题
I'm new to Python and I've been trying to get the source code of a page and tried several methods on both Python 2 and 3 (here's one)
import urllib
url = "https://www.google.ca/?gfe_rd=cr&ei=u6d_VbzoMaei8wfE1oHgBw&gws_rd=ssl#q=test"
f = urllib.urlopen(url)
source = f.read()
print source
but I keep getting the following error:
Traceback (most recent call last):
File "C:\Python34\openpage.py", line 4, in <module>
f = urllib.urlopen(url)
File "C:\Python27\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python27\lib\urllib.py", line 213, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 443, in open_https
h.endheaders(data)
File "C:\Python27\lib\httplib.py", line 1049, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 893, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 855, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 1274, in connect
server_hostname=server_hostname)
File "C:\Python27\lib\ssl.py", line 352, in wrap_socket
_context=self)
File "C:\Python27\lib\ssl.py", line 579, in __init__
self.do_handshake()
File "C:\Python27\lib\ssl.py", line 808, in do_handshake
self._sslobj.do_handshake()
IOError: [Errno socket error] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)
The last line suggest that the error comes from the secure search, but I can't seem to find a way around it.
I have looked at this post, but still no success.
回答1:
Here's a sample code you can try on Python3, using urlparse
import http.client
from urllib.parse import urlparse
url = "https://www.google.ca/?gfe_rd=cr&ei=u6d_VbzoMaei8wfE1oHgBw&gws_rd=ssl#q=test"
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('GET', p.path)
resp = conn.getresponse()
print('resp= {}'.format(resp.read()))
It will work based on your parameters to conn.request() function, though. You could try other method types like HEAD for example and your response will change accordingly.
If you want to test whether your request worked or not, you can always try:
print(resp.status)
In this case, it gives 200. The list of status codes are available here
Some other examples can be found as well.
回答2:
You are using https which is a secure protocol. It says
SSL: CERTIFICATE_VERIFY_FAILED
Try http or use ssl https://docs.python.org/2/library/ssl.html
url = "http://www.google.ca
来源:https://stackoverflow.com/questions/30858979/error-reading-webpage-source-code-using-python