Urllib and validation of server certificate

前端 未结 2 1360
旧巷少年郎
旧巷少年郎 2020-12-23 22:52

I use python 2.6 and request Facebook API (https). I guess my service could be target of Man In The Middle attacks. I discovered this morning reading again urllib module doc

相关标签:
2条回答
  • 2020-12-23 23:50

    You could create a urllib2 opener which can do the validation for you using a custom handler. The following code is an example that works with Python 2.7.3 . It assumes you have downloaded http://curl.haxx.se/ca/cacert.pem to the same folder where the script is saved.

    #!/usr/bin/env python
    import urllib2
    import httplib
    import ssl
    import socket
    import os
    
    CERT_FILE = os.path.join(os.path.dirname(__file__), 'cacert.pem')
    
    
    class ValidHTTPSConnection(httplib.HTTPConnection):
            "This class allows communication via SSL."
    
            default_port = httplib.HTTPS_PORT
    
            def __init__(self, *args, **kwargs):
                httplib.HTTPConnection.__init__(self, *args, **kwargs)
    
            def connect(self):
                "Connect to a host on a given (SSL) port."
    
                sock = socket.create_connection((self.host, self.port),
                                                self.timeout, self.source_address)
                if self._tunnel_host:
                    self.sock = sock
                    self._tunnel()
                self.sock = ssl.wrap_socket(sock,
                                            ca_certs=CERT_FILE,
                                            cert_reqs=ssl.CERT_REQUIRED)
    
    
    class ValidHTTPSHandler(urllib2.HTTPSHandler):
    
        def https_open(self, req):
                return self.do_open(ValidHTTPSConnection, req)
    
    opener = urllib2.build_opener(ValidHTTPSHandler)
    
    
    def test_access(url):
        print "Acessing", url
        page = opener.open(url)
        print page.info()
        data = page.read()
        print "First 100 bytes:", data[0:100]
        print "Done accesing", url
        print ""
    
    # This should work
    test_access("https://www.google.com")
    
    # Accessing a page with a self signed certificate should not work
    # At the time of writing, the following page uses a self signed certificate
    test_access("https://tidia.ita.br/")
    

    Running this script you should see something a output like this:

    Acessing https://www.google.com
    Date: Mon, 14 Jan 2013 14:19:03 GMT
    Expires: -1
    ...
    
    First 100 bytes: <!doctype html><html itemscope="itemscope" itemtype="http://schema.org/WebPage"><head><meta itemprop
    Done accesing https://www.google.com
    
    Acessing https://tidia.ita.br/
    Traceback (most recent call last):
      File "https_validation.py", line 54, in <module>
        test_access("https://tidia.ita.br/")
      File "https_validation.py", line 42, in test_access
        page = opener.open(url)
      ...
      File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1177, in do_open
        raise URLError(err)
    urllib2.URLError: <urlopen error [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed>
    
    0 讨论(0)
  • 2020-12-23 23:52

    If you have a trusted Certificate Authority (CA) file, you can use Python 2.6 and later's ssl library to validate the certificate. Here's some code:

    import os.path
    import ssl
    import sys
    import urlparse
    import urllib
    
    def get_ca_path():
        '''Download the Mozilla CA file cached by the cURL project.
    
        If you have a trusted CA file from your OS, return the path
        to that instead.
        '''
        cafile_local = 'cacert.pem'
        cafile_remote = 'http://curl.haxx.se/ca/cacert.pem'
        if not os.path.isfile(cafile_local):
            print >> sys.stderr, "Downloading %s from %s" % (
                cafile_local, cafile_remote)
        urllib.urlretrieve(cafile_remote, cafile_local)
        return cafile_local
    
    def check_ssl(hostname, port=443):
        '''Check that an SSL certificate is valid.'''
        print >> sys.stderr, "Validating SSL cert at %s:%d" % (
            hostname, port)
    
        cafile_local = get_ca_path()
        try:
            server_cert = ssl.get_server_certificate((hostname, port),
                ca_certs=cafile_local)
        except ssl.SSLError:
            print >> sys.stderr, "SSL cert at %s:%d is invalid!" % (
                hostname, port)
            raise 
    
    class CheckedSSLUrlOpener(urllib.FancyURLopener):
        '''A URL opener that checks that SSL certificates are valid
    
        On SSL error, it will raise ssl.
        '''
    
        def open(self, fullurl, data = None):
            urlbits = urlparse.urlparse(fullurl)
            if urlbits.scheme == 'https':
                if ':' in urlbits.netloc:
                    hostname, port = urlbits.netloc.split(':')
                else:
                    hostname = urlbits.netloc
                    if urlbits.port is None:
                        port = 443
                    else:
                        port = urlbits.port
                check_ssl(hostname, port)
            return urllib.FancyURLopener.open(self, fullurl, data)
    
    # Plain usage - can probably do once per day
    check_ssl('www.facebook.com')
    
    # URL Opener
    opener = CheckedSSLUrlOpener()
    opener.open('https://www.facebook.com/find-friends/browser/')
    
    # Make it the default
    urllib._urlopener = opener
    urllib.urlopen('https://www.facebook.com/find-friends/browser/')
    

    Some dangers with this code:

    1. You have to trust the CA file from the cURL project (http://curl.haxx.se/ca/cacert.pem), which is a cached version of Mozilla's CA file. It's also over HTTP, so there is a potential MITM attack. It's better to replace get_ca_path with one that returns your local CA file, which will vary from host to host.
    2. There is no attempt to see if the CA file has been updated. Eventually, root certs will expire or be deactivated, and new ones will be added. A good idea would be to use a cron job to delete the cached CA file, so that a new one is downloaded daily.
    3. It's probably overkill to check certificates every time. You could manually check once per run, or keep a list of 'known good' hosts over the course of the run. Or, be paranoid!
    0 讨论(0)
提交回复
热议问题