What's the meaning of pool_connections in requests.adapters.HTTPAdapter?

前端 未结 3 2050
夕颜
夕颜 2020-12-07 23:19

When initializing a requests\' Session, two HTTPAdapter will be created and mount to http and https.

This is how HTTPAdapter is defined:

3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-07 23:57

    Thanks to @laike9m for the existing Q&A and article, but the existing answers fail to mention the subtleties of pool_maxsize and its relation to multithreaded code.

    Summary

    • pool_connections is number of connections that can be kept alive in the pool at a given time from one (host, port, scheme) endpoint. If you want to keep around a max of n open TCP connections in a pool for reuse with a Session, you want pool_connections=n.
    • pool_maxsize is effectively irrelevant for users of requests due to the default value for pool_block (in requests.adapters.HTTPAdapter) being False rather than True

    Detail

    As correctly pointed out here, pool_connections is the maximum number of open connections given the adapter's prefix. It's best illustrated through example:

    >>> import requests
    >>> from requests.adapters import HTTPAdapter
    >>> 
    >>> from urllib3 import add_stderr_logger
    >>> 
    >>> add_stderr_logger()  # Turn on requests.packages.urllib3 logging
    2018-12-21 20:44:03,979 DEBUG Added a stderr logging handler to logger: urllib3
     (NOTSET)>
    >>> 
    >>> s = requests.Session()
    >>> s.mount('https://', HTTPAdapter(pool_connections=1))
    >>> 
    >>> # 4 consecutive requests to (github.com, 443, https)
    ... # A new HTTPS (TCP) connection will be established only on the first conn.
    ... s.get('https://github.com/requests/requests/blob/master/requests/adapters.py')
    2018-12-21 20:44:03,982 DEBUG Starting new HTTPS connection (1): github.com:443
    2018-12-21 20:44:04,381 DEBUG https://github.com:443 "GET /requests/requests/blob/master/requests/adapters.py HTTP/1.1" 200 None
    
    >>> s.get('https://github.com/requests/requests/blob/master/requests/packages.py')
    2018-12-21 20:44:04,548 DEBUG https://github.com:443 "GET /requests/requests/blob/master/requests/packages.py HTTP/1.1" 200 None
    
    >>> s.get('https://github.com/urllib3/urllib3/blob/master/src/urllib3/__init__.py')
    2018-12-21 20:44:04,881 DEBUG https://github.com:443 "GET /urllib3/urllib3/blob/master/src/urllib3/__init__.py HTTP/1.1" 200 None
    
    >>> s.get('https://github.com/python/cpython/blob/master/Lib/logging/__init__.py')
    2018-12-21 20:44:06,533 DEBUG https://github.com:443 "GET /python/cpython/blob/master/Lib/logging/__init__.py HTTP/1.1" 200 None
    
    

    Above, the max number of connections is 1; it is (github.com, 443, https). If you want to request a resource from a new (host, port, scheme) triple, the Session internally will dump the existing connection to make room for a new one:

    >>> s.get('https://www.rfc-editor.org/info/rfc4045')
    2018-12-21 20:46:11,340 DEBUG Starting new HTTPS connection (1): www.rfc-editor.org:443
    2018-12-21 20:46:12,185 DEBUG https://www.rfc-editor.org:443 "GET /info/rfc4045 HTTP/1.1" 200 6707
    
    >>> s.get('https://www.rfc-editor.org/info/rfc4046')
    2018-12-21 20:46:12,667 DEBUG https://www.rfc-editor.org:443 "GET /info/rfc4046 HTTP/1.1" 200 6862
    
    >>> s.get('https://www.rfc-editor.org/info/rfc4047')
    2018-12-21 20:46:13,837 DEBUG https://www.rfc-editor.org:443 "GET /info/rfc4047 HTTP/1.1" 200 6762
    
    

    You can up the number to pool_connections=2, then cycle between 3 unique host combinations, and you'll see the same thing in play. (One other thing to note is that the session will retain and send back cookies in this same way.)

    Now for pool_maxsize, which is passed to urllib3.poolmanager.PoolManager and ultimately to urllib3.connectionpool.HTTPSConnectionPool. The docstring for maxsize is:

    Number of connections to save that can be reused. More than 1 is useful in multithreaded situations. If block is set to False, more connections will be created but they will not be saved once they've been used.

    Incidentally, block=False is the default for HTTPAdapter, even though the default is True for HTTPConnectionPool. This implies that pool_maxsize has little to no effect for HTTPAdapter.

    Furthermore, requests.Session() is not thread safe; you shouldn't use the same session instance from multiple threads. (See here and here.) If you really want to, the safer way to go would be to lend each thread its own localized session instance, then use that session to make requests over multiple URLs, via threading.local():

    import threading
    import requests
    
    local = threading.local()  # values will be different for separate threads.
    
    vars(local)  # initially empty; a blank class with no attrs.
    
    
    def get_or_make_session(**adapter_kwargs):
        # `local` will effectively vary based on the thread that is calling it
        print('get_or_make_session() called from id:', threading.get_ident())
    
        if not hasattr(local, 'session'):
            session = requests.Session()
            adapter = requests.adapters.HTTPAdapter(**kwargs)
            session.mount('http://', adapter)
            session.mount('https://', adapter)
            local.session = session
        return local.session
    

提交回复
热议问题