Asynchronous Requests with Python requests

那年仲夏 提交于 2019-12-16 19:54:55

问题


I tried the sample provided within the documentation of the requests library for python.

With async.map(rs), I get the response codes, but I want to get the content of each page requested. This, for example, does not work:

out = async.map(rs)
print out[0].content

回答1:


Note

The below answer is not applicable to requests v0.13.0+. The asynchronous functionality was moved to grequests after this question was written. However, you could just replace requests with grequests below and it should work.

I've left this answer as is to reflect the original question which was about using requests < v0.13.0.


To do multiple tasks with async.map asynchronously you have to:

  1. Define a function for what you want to do with each object (your task)
  2. Add that function as an event hook in your request
  3. Call async.map on a list of all the requests / actions

Example:

from requests import async
# If using requests > v0.13.0, use
# from grequests import async

urls = [
    'http://python-requests.org',
    'http://httpbin.org',
    'http://python-guide.org',
    'http://kennethreitz.com'
]

# A simple task to do to each response object
def do_something(response):
    print response.url

# A list to hold our things to do via async
async_list = []

for u in urls:
    # The "hooks = {..." part is where you define what you want to do
    # 
    # Note the lack of parentheses following do_something, this is
    # because the response will be used as the first argument automatically
    action_item = async.get(u, hooks = {'response' : do_something})

    # Add the task to our list of things to do via async
    async_list.append(action_item)

# Do our list of things to do via async
async.map(async_list)



回答2:


async is now an independent module : grequests.

See here : https://github.com/kennethreitz/grequests

And there: Ideal method for sending multiple HTTP requests over Python?

installation:

$ pip install grequests

usage:

build a stack:

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)

send the stack

grequests.map(rs)

result looks like

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

grequests don't seem to set a limitation for concurrent requests, ie when multiple requests are sent to the same server.




回答3:


I tested both requests-futures and grequests. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests. I decided to write my own and simply wrapped requests into ThreadPollExecutor and it was almost as fast as grequests, but without external dependencies.

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1



回答4:


maybe requests-futures is another choice.

from requests_futures.sessions import FuturesSession

session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)

It is also recommended in the office document. If you don't want involve gevent, it's a good one.




回答5:


I know this has been closed for a while, but I thought it might be useful to promote another async solution built on the requests library.

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content

The docs are here: http://pythonhosted.org/simple-requests/




回答6:


threads=list()

for requestURI in requests:
    t = Thread(target=self.openURL, args=(requestURI,))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

...

def openURL(self, requestURI):
    o = urllib2.urlopen(requestURI, timeout = 600)
    o...



回答7:


If you want to use asyncio, then requests-async provides async/await functionality for requests - https://github.com/encode/requests-async




回答8:


I have been using python requests for async calls against github's gist API for some time.

For an example, see the code here:

https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72

This style of python may not be the clearest example, but I can assure you that the code works. Let me know if this is confusing to you and I will document it.




回答9:


I have also tried some things using the asynchronous methods in python, how ever I have had much better luck using twisted for asynchronous programming. It has fewer problems and is well documented. Here is a link of something simmilar to what you are trying in twisted.

http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html




回答10:


So many 3rd party libraries supported, and nearly all of them are deprecated/point elsewhere, or have severely limited functionality.

I have a lot of issues with most of the answers posted - they either use deprecated libraries that have been ported over with limited features, or provide a solution with too much magic on the execution of the request, making it difficult to error handle.

Some of the solutions works alright purely in http requests, but the solutions fall short for any other kind of request.

I believe simply using the python built-in library asyncio was sufficient enough to perform asynchronous requests of any type, as well as providing enough fluidity for complex and usecase specific error handling.

import asyncio

loop = asyncio.get_event_loop()

def do_thing(ch_id):
    async def get_chan_info_and_do_chores(ch_id):
        response = perform_grpc_call(ch_id)
        do_chores(response)

    async def get_httpapi_info_and_do_chores(ch_id):
        response = requests.get(URL)
        do_chores(response)

    async_tasks = []
    for ch_id in list(channels):
       async_tasks.append(loop.create_task(get_chan_info_and_do_chores(ch_id)))
       async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))

    loop.run_until_complete(asyncio.gather(*async_tasks))

How it works is simple. you're basically creating a series of tasks you'd like to occur asynchronously, and then ask a loop to execute those tasks and exit upon completion. No extra libraries subject to lack of maintenance, no lack of functionality required.



来源:https://stackoverflow.com/questions/9110593/asynchronous-requests-with-python-requests

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!