aiohttp: rate limiting parallel requests

后端 未结 3 1477
孤独总比滥情好
孤独总比滥情好 2020-12-05 11:41

APIs often have rate limits that users have to follow. As an example let\'s take 50 requests/second. Sequential requests take 0.5-1 second and thus are too slow to come clos

3条回答
  •  悲哀的现实
    2020-12-05 12:02

    I liked @sraw's approached this with asyncio, but their answer didn't quite cut it for me. Since I don't know if my calls to download are going to each be faster or slower than the rate limit I want to have the option to run many in parallel when requests are slow and run one at a time when requests are very fast so that I'm always right at the rate limit.

    I do this by using a queue with a producer that produces new tasks at the rate limit, then many consumers that will either all wait on the next job if they're fast, or there will be work backed up in the queue if they are slow, and will run as fast as the processor/network allow:

    import asyncio
    from datetime import datetime 
    
    async def download(url):
      # download or whatever
      task_time = 1/10
      await asyncio.sleep(task_time)
      result = datetime.now()
      return result, url
    
    async def producer_fn(queue, urls, max_per_second):
      for url in urls:
        await queue.put(url)
        await asyncio.sleep(1/max_per_second)
     
    async def consumer(work_queue, result_queue):
      while True:
        url = await work_queue.get()
        result = await download(url)
        work_queue.task_done()
        await result_queue.put(result)
    
    urls = range(20)
    async def main():
      work_queue = asyncio.Queue()
      result_queue = asyncio.Queue()
    
      num_consumer_tasks = 10
      max_per_second = 5
      consumers = [asyncio.create_task(consumer(work_queue, result_queue))
                   for _ in range(num_consumer_tasks)]    
      producer = asyncio.create_task(producer_fn(work_queue, urls, max_per_second))
      await producer
    
      # wait for the remaining tasks to be processed
      await work_queue.join()
      # cancel the consumers, which are now idle
      for c in consumers:
        c.cancel()
    
      while not result_queue.empty():
        result, url = await result_queue.get()
        print(f'{url} finished at {result}')
     
    asyncio.run(main())
    

提交回复
热议问题