Python urllib2 with keep alive

匿名 (未验证) 提交于 2019-12-03 01:29:01

问题:

How can I make a "keep alive" HTTP request using Python's urllib2?

回答1:

Use the urlgrabber library. This includes an HTTP handler for urllib2 that supports HTTP 1.1 and keepalive:

>>> import urllib2 >>> from urlgrabber.keepalive import HTTPHandler >>> keepalive_handler = HTTPHandler() >>> opener = urllib2.build_opener(keepalive_handler) >>> urllib2.install_opener(opener) >>>  >>> fo = urllib2.urlopen('http://www.python.org') 

Note: you should use urlgrabber version 3.9.0 or earlier, as the keepalive module has been removed in version 3.9.1

There is a port the keepalive module to Python 3.



回答2:

Try urllib3 which has the following features:

  • Re-use the same socket connection for multiple requests (HTTPConnectionPool and HTTPSConnectionPool) (with optional client-side certificate verification).
  • File posting (encode_multipart_formdata).
  • Built-in redirection and retries (optional).
  • Supports gzip and deflate decoding.
  • Thread-safe and sanity-safe.
  • Small and easy to understand codebase perfect for extending and building upon. For a more comprehensive solution, have a look at Requests.

or a much more comprehensive solution - Requests - which supports keep-alive from version 0.8.0 (by using urllib3 internally) and has the following features:

  • Extremely simple HEAD, GET, POST, PUT, PATCH, DELETE Requests.
  • Gevent support for Asyncronous Requests.
  • Sessions with cookie persistience.
  • Basic, Digest, and Custom Authentication support.
  • Automatic form-encoding of dictionaries
  • A simple dictionary interface for request/response cookies.
  • Multipart file uploads.
  • Automatc decoding of Unicode, gzip, and deflate responses.
  • Full support for unicode URLs and domain names.


回答3:

Or check out httplib's HTTPConnection.



回答4:

Unfortunately keepalive.py was removed from urlgrabber on 25 Sep 2009 by the following change after urlgrabber was changed to depend on pycurl (which supports keep-alive):

http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commit;h=f964aa8bdc52b29a2c137a917c72eecd4c4dda94

However, you can still get the last revision of keepalive.py here:

http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=blob_plain;f=urlgrabber/keepalive.py;hb=a531cb19eb162ad7e0b62039d19259341f37f3a6



回答5:

Note that urlgrabber does not entirely work with python 2.6. I fixed the issues (I think) by making the following modifications in keepalive.py.

In keepalive.HTTPHandler.do_open() remove this

     if r.status == 200 or not HANDLE_ERRORS:          return r 

And insert this

     if r.status == 200 or not HANDLE_ERRORS:          # [speedplane] Must return an adinfourl object          resp = urllib2.addinfourl(r, r.msg, req.get_full_url())          resp.code = r.status          resp.msg = r.reason          return resp 


回答6:

Please avoid collective pain and use Requests instead. It will do the right thing by default and use keep-alive if applicable.



回答7:

Here's a somewhat similar urlopen() that does keep-alive, though it's not threadsafe.

try:     from http.client import HTTPConnection, HTTPSConnection except ImportError:     from httplib import HTTPConnection, HTTPSConnection import select connections = {}   def request(method, url, body=None, headers={}, **kwargs):     scheme, _, host, path = url.split('/', 3)     h = connections.get((scheme, host))     if h and select.select([h.sock], [], [], 0)[0]:         h.close()         h = None     if not h:         Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection         h = connections[(scheme, host)] = Connection(host, **kwargs)     h.request(method, '/' + path, body, headers)     return h.getresponse()   def urlopen(url, data=None, *args, **kwargs):     resp = request('POST' if data else 'GET', url, data, *args, **kwargs)     assert resp.status 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!