How to Speed Up Python's urllib2 when doing multiple requests

匿名 (未验证) 提交于 2019-12-03 01:12:01

问题:

I am making several http requests to a particular host using python's urllib2 library. Each time a request is made a new tcp and http connection is created which takes a noticeable amount of time. Is there any way to keep the tcp/http connection alive using urllib2?

回答1:

If you switch to httplib, you will have finer control over the underlying connection.

For example:

import httplib  conn = httplib.HTTPConnection(url)  conn.request('GET', '/foo') r1 = conn.getresponse() r1.read()  conn.request('GET', '/bar') r2 = conn.getresponse() r2.read()  conn.close() 

This would send 2 HTTP GETs on the same underlying TCP connection.



回答2:

I've used the third-party urllib3 library to good effect in the past. It's designed to complement urllib2 by pooling connections for reuse.

Modified example from the wiki:

>>> from urllib3 import HTTPConnectionPool >>> # Create a connection pool for a specific host ... http_pool = HTTPConnectionPool('www.google.com') >>> # simple GET request, for example ... r = http_pool.urlopen('GET', '/') >>> print r.status, len(r.data) 200 28050 >>> r = http_pool.urlopen('GET', '/search?q=hello+world') >>> print r.status, len(r.data) 200 79124 


回答3:

If you need something more automatic than plain httplib, this might help, though it's not threadsafe.

try:     from http.client import HTTPConnection, HTTPSConnection except ImportError:     from httplib import HTTPConnection, HTTPSConnection import select connections = {}   def request(method, url, body=None, headers={}, **kwargs):     scheme, _, host, path = url.split('/', 3)     h = connections.get((scheme, host))     if h and select.select([h.sock], [], [], 0)[0]:         h.close()         h = None     if not h:         Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection         h = connections[(scheme, host)] = Connection(host, **kwargs)     h.request(method, '/' + path, body, headers)     return h.getresponse()   def urlopen(url, data=None, *args, **kwargs):     resp = request('POST' if data else 'GET', url, data, *args, **kwargs)     assert resp.status 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!