Error reading webpage source code using Python

半腔热情 提交于 2020-01-02 19:59:08

问题


I'm new to Python and I've been trying to get the source code of a page and tried several methods on both Python 2 and 3 (here's one)

import urllib

url = "https://www.google.ca/?gfe_rd=cr&ei=u6d_VbzoMaei8wfE1oHgBw&gws_rd=ssl#q=test"
f = urllib.urlopen(url)
source = f.read()
print source

but I keep getting the following error:

Traceback (most recent call last):
  File "C:\Python34\openpage.py", line 4, in <module>
    f = urllib.urlopen(url)
  File "C:\Python27\lib\urllib.py", line 87, in urlopen
    return opener.open(url)
  File "C:\Python27\lib\urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 443, in open_https
    h.endheaders(data)
  File "C:\Python27\lib\httplib.py", line 1049, in endheaders
    self._send_output(message_body)
  File "C:\Python27\lib\httplib.py", line 893, in _send_output
    self.send(msg)
  File "C:\Python27\lib\httplib.py", line 855, in send
    self.connect()
  File "C:\Python27\lib\httplib.py", line 1274, in connect
    server_hostname=server_hostname)
  File "C:\Python27\lib\ssl.py", line 352, in wrap_socket
    _context=self)
  File "C:\Python27\lib\ssl.py", line 579, in __init__
    self.do_handshake()
  File "C:\Python27\lib\ssl.py", line 808, in do_handshake
    self._sslobj.do_handshake()
IOError: [Errno socket error] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

The last line suggest that the error comes from the secure search, but I can't seem to find a way around it.

I have looked at this post, but still no success.


回答1:


Here's a sample code you can try on Python3, using urlparse

import http.client
from urllib.parse import urlparse
url = "https://www.google.ca/?gfe_rd=cr&ei=u6d_VbzoMaei8wfE1oHgBw&gws_rd=ssl#q=test"
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('GET', p.path)
resp = conn.getresponse()
print('resp= {}'.format(resp.read()))

It will work based on your parameters to conn.request() function, though. You could try other method types like HEAD for example and your response will change accordingly.

If you want to test whether your request worked or not, you can always try:

print(resp.status)

In this case, it gives 200. The list of status codes are available here

Some other examples can be found as well.




回答2:


You are using https which is a secure protocol. It says

SSL: CERTIFICATE_VERIFY_FAILED

Try http or use ssl https://docs.python.org/2/library/ssl.html

url = "http://www.google.ca


来源:https://stackoverflow.com/questions/30858979/error-reading-webpage-source-code-using-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!