(Python 3.4.2) Would anyone be able to help me fetch https pages with urllib? I\'ve spent hours trying to figure this out.
Here\'s what I\'m trying to do (pretty bas
I had the same error when I tried to open a url with https, but no errors with http.
>>> from urllib.request import urlopen
>>> urlopen('http://google.com')
<http.client.HTTPResponse object at 0xb770252c>
>>> urlopen('https://google.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/local/lib/python3.7/urllib/request.py", line 548, in _open
'unknown_open', req)
File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 1387, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: https>
This was done on Ubuntu 16.04 using Python 3.7. The native Ubuntu defaults to Python 3.5 in /usr/bin and previously I had source downloaded and upgraded to 3.7 in /usr/local/bin. The fact that there was no error for 3.5 pointed to the executable /usr/bin/openssl not being installed correctly in 3.7 which is also evident below:
>>> import ssl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/ssl.py", line 98, in <module>
import _ssl # if we can't import it, let the error propagate
ModuleNotFoundError: No module named '_ssl'
By consulting this link, I changed SSL=/usr/local/ssl to SSL=/usr in 3.7 source dir's Modules/Setup.dist and also cp it into Setup and then rebuilt Python 3.7.
$ ./configure
$ make
$ make install
Now it is fixed:
>>> import ssl
>>> ssl.OPENSSL_VERSION
'OpenSSL 1.0.2g 1 Mar 2016'
>>> urlopen('https://www.google.com')
<http.client.HTTPResponse object at 0xb74c4ecc>
>>> urlopen('https://www.google.com').read()
b'<!doctype html>...
and 3.7 has been complied with OpenSSL support successfully. Note that the Ubuntu command "openssl version" is not complete until you load it into Python.
urllib.error.URLError: <urlopen error unknown url type: 'https>
The 'https
and not https
in the error message indicates that you did not try a http://
request but instead a 'https://
request which of course does not exist. Check how you construct your URL.
Double check your compilation options, looks like something is wrong with your box.
At least the following code works for me:
from urllib.request import urlopen
resp = urlopen('https://github.com')
print(resp.read())
this may help
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()