urllib cannot read https

后端 未结 4 653
温柔的废话
温柔的废话 2020-12-10 05:35

(Python 3.4.2) Would anyone be able to help me fetch https pages with urllib? I\'ve spent hours trying to figure this out.

Here\'s what I\'m trying to do (pretty bas

相关标签:
4条回答
  • 2020-12-10 05:49

    I had the same error when I tried to open a url with https, but no errors with http.

    >>> from urllib.request import urlopen
    >>> urlopen('http://google.com')
    <http.client.HTTPResponse object at 0xb770252c>
    >>> urlopen('https://google.com')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
        return opener.open(url, data, timeout)
      File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
        response = self._open(req, data)
      File "/usr/local/lib/python3.7/urllib/request.py", line 548, in _open
        'unknown_open', req)
      File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
        result = func(*args)
      File "/usr/local/lib/python3.7/urllib/request.py", line 1387, in unknown_open
        raise URLError('unknown url type: %s' % type)
    urllib.error.URLError: <urlopen error unknown url type: https>
    

    This was done on Ubuntu 16.04 using Python 3.7. The native Ubuntu defaults to Python 3.5 in /usr/bin and previously I had source downloaded and upgraded to 3.7 in /usr/local/bin. The fact that there was no error for 3.5 pointed to the executable /usr/bin/openssl not being installed correctly in 3.7 which is also evident below:

    >>> import ssl
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.7/ssl.py", line 98, in <module>
        import _ssl             # if we can't import it, let the error propagate
    ModuleNotFoundError: No module named '_ssl'
    

    By consulting this link, I changed SSL=/usr/local/ssl to SSL=/usr in 3.7 source dir's Modules/Setup.dist and also cp it into Setup and then rebuilt Python 3.7.

    $ ./configure
    $ make
    $ make install
    

    Now it is fixed:

    >>> import ssl
    >>> ssl.OPENSSL_VERSION
    'OpenSSL 1.0.2g  1 Mar 2016'
    >>> urlopen('https://www.google.com') 
    <http.client.HTTPResponse object at 0xb74c4ecc>
    >>> urlopen('https://www.google.com').read()
    b'<!doctype html>...
    

    and 3.7 has been complied with OpenSSL support successfully. Note that the Ubuntu command "openssl version" is not complete until you load it into Python.

    0 讨论(0)
  • 2020-12-10 05:51
    urllib.error.URLError: <urlopen error unknown url type: 'https>
    

    The 'https and not https in the error message indicates that you did not try a http:// request but instead a 'https:// request which of course does not exist. Check how you construct your URL.

    0 讨论(0)
  • 2020-12-10 05:53

    Double check your compilation options, looks like something is wrong with your box.

    At least the following code works for me:

    from urllib.request import urlopen
    resp = urlopen('https://github.com')
    print(resp.read())
    
    0 讨论(0)
  • 2020-12-10 06:01

    this may help

    Ignore SSL certificate errors

    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    
    url = input('Enter - ')
    html = urllib.request.urlopen(url, context=ctx).read()
    
    0 讨论(0)
提交回复
热议问题