How to Speed Up Python's urllib2 when doing multiple requests

后端 未结 3 1510
深忆病人
深忆病人 2020-12-09 03:24

I am making several http requests to a particular host using python\'s urllib2 library. Each time a request is made a new tcp and http connection is created which takes a no

相关标签:
3条回答
  • 2020-12-09 03:56

    If you switch to httplib, you will have finer control over the underlying connection.

    For example:

    import httplib
    
    conn = httplib.HTTPConnection(url)
    
    conn.request('GET', '/foo')
    r1 = conn.getresponse()
    r1.read()
    
    conn.request('GET', '/bar')
    r2 = conn.getresponse()
    r2.read()
    
    conn.close()
    

    This would send 2 HTTP GETs on the same underlying TCP connection.

    0 讨论(0)
  • 2020-12-09 04:15

    I've used the third-party urllib3 library to good effect in the past. It's designed to complement urllib2 by pooling connections for reuse.

    Modified example from the wiki:

    >>> from urllib3 import HTTPConnectionPool
    >>> # Create a connection pool for a specific host
    ... http_pool = HTTPConnectionPool('www.google.com')
    >>> # simple GET request, for example
    ... r = http_pool.urlopen('GET', '/')
    >>> print r.status, len(r.data)
    200 28050
    >>> r = http_pool.urlopen('GET', '/search?q=hello+world')
    >>> print r.status, len(r.data)
    200 79124
    
    0 讨论(0)
  • 2020-12-09 04:18

    If you need something more automatic than plain httplib, this might help, though it's not threadsafe.

    try:
        from http.client import HTTPConnection, HTTPSConnection
    except ImportError:
        from httplib import HTTPConnection, HTTPSConnection
    import select
    connections = {}
    
    
    def request(method, url, body=None, headers={}, **kwargs):
        scheme, _, host, path = url.split('/', 3)
        h = connections.get((scheme, host))
        if h and select.select([h.sock], [], [], 0)[0]:
            h.close()
            h = None
        if not h:
            Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection
            h = connections[(scheme, host)] = Connection(host, **kwargs)
        h.request(method, '/' + path, body, headers)
        return h.getresponse()
    
    
    def urlopen(url, data=None, *args, **kwargs):
        resp = request('POST' if data else 'GET', url, data, *args, **kwargs)
        assert resp.status < 400, (resp.status, resp.reason, resp.read())
        return resp
    
    0 讨论(0)
提交回复
热议问题