I am making several http requests to a particular host using python's urllib2 library. Each time a request is made a new tcp and http connection is created which takes a noticeable amount of time. Is there any way to keep the tcp/http connection alive using urllib2?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
If you switch to httplib, you will have finer control over the underlying connection.
For example:
import httplib conn = httplib.HTTPConnection(url) conn.request('GET', '/foo') r1 = conn.getresponse() r1.read() conn.request('GET', '/bar') r2 = conn.getresponse() r2.read() conn.close() This would send 2 HTTP GETs on the same underlying TCP connection.
回答2:
I've used the third-party urllib3 library to good effect in the past. It's designed to complement urllib2 by pooling connections for reuse.
Modified example from the wiki:
>>> from urllib3 import HTTPConnectionPool >>> # Create a connection pool for a specific host ... http_pool = HTTPConnectionPool('www.google.com') >>> # simple GET request, for example ... r = http_pool.urlopen('GET', '/') >>> print r.status, len(r.data) 200 28050 >>> r = http_pool.urlopen('GET', '/search?q=hello+world') >>> print r.status, len(r.data) 200 79124 回答3:
If you need something more automatic than plain httplib, this might help, though it's not threadsafe.
try: from http.client import HTTPConnection, HTTPSConnection except ImportError: from httplib import HTTPConnection, HTTPSConnection import select connections = {} def request(method, url, body=None, headers={}, **kwargs): scheme, _, host, path = url.split('/', 3) h = connections.get((scheme, host)) if h and select.select([h.sock], [], [], 0)[0]: h.close() h = None if not h: Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection h = connections[(scheme, host)] = Connection(host, **kwargs) h.request(method, '/' + path, body, headers) return h.getresponse() def urlopen(url, data=None, *args, **kwargs): resp = request('POST' if data else 'GET', url, data, *args, **kwargs) assert resp.status