Ideal method for sending multiple HTTP requests over Python? [duplicate]

折月煮酒 提交于 2019-12-28 02:41:07

问题


Possible Duplicate:
Multiple (asynchronous) connections with urllib2 or other http library?

I am working on a Linux web server that runs Python code to grab realtime data over HTTP from a 3rd party API. The data is put into a MySQL database. I need to make a lot of queries to a lot of URL's, and I need to do it fast (faster = better). Currently I'm using urllib3 as my HTTP library. What is the best way to go about this? Should I spawn multiple threads (if so, how many?) and have each query for a different URL? I would love to hear your thoughts about this - thanks!


回答1:


If a lot is really a lot than you probably want use asynchronous io not threads.

requests + gevent = grequests

GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)
grequests.map(rs)



回答2:


You should use multithreading as well as pipelining requests. For example search->details->save

The number of threads you can use doesn't depend on your equipment only. How many requests the service can serve? How many concurrent requests does it allow to run? Even your bandwidth can be a bottleneck.

If you're talking about a kind of scraping - the service could block you after certain limit of requests, so you need to use proxies or multiple IP bindings.

As for me, in the most cases, I can run 50-300 concurrent requests on my laptop from python scripts.




回答3:


Sounds like an excellent application for Twisted. Here are some web-related examples, including how to download a web page. Here is a related question on database connections with Twisted.

Note that Twisted does not rely on threads for doing multiple things at once. Rather, it takes a cooperative multitasking approach---your main script starts the reactor and the reactor calls functions that you set up. Your functions must return control to the reactor before the reactor can continue working.



来源:https://stackoverflow.com/questions/10555292/ideal-method-for-sending-multiple-http-requests-over-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!