Limiting/throttling the rate of HTTP requests in GRequests

前端 未结 4 1547
盖世英雄少女心
盖世英雄少女心 2020-12-23 14:46

I\'m writing a small script in Python 2.7.3 with GRequests and lxml that will allow me to gather some collectible card prices from various websites and compare them. Problem

4条回答
  •  轮回少年
    2020-12-23 15:16

    I had a similar problem. Here's my solution. In your case, I would do:

    def worker():
        with rate_limit('slow.domain.com', 2):
            response = requests.get('https://slow.domain.com/path')
            text = response.text
        # Use `text`
    

    Assuming you have multiple domains you're culling from, I would setup a dictionary mapping (domain, delay) so you don't hit your rate limits.

    This code assumes you're going to use gevent and monkey patch.

    from contextlib import contextmanager
    from gevent.event import Event
    from gevent.queue import Queue
    from time import time
    
    
    def rate_limit(resource, delay, _queues={}):
        """Delay use of `resource` until after `delay` seconds have passed.
    
        Example usage:
    
        def worker():
            with rate_limit('foo.bar.com', 1):
                response = requests.get('https://foo.bar.com/path')
                text = response.text
            # use `text`
    
        This will serialize and delay requests from multiple workers for resource
        'foo.bar.com' by 1 second.
    
        """
    
        if resource not in _queues:
            queue = Queue()
            gevent.spawn(_watch, queue)
            _queues[resource] = queue
    
        return _resource_manager(_queues[resource], delay)
    
    
    def _watch(queue):
        "Watch `queue` and wake event listeners after delay."
    
        last = 0
    
        while True:
            event, delay = queue.get()
    
            now = time()
    
            if (now - last) < delay:
                gevent.sleep(delay - (now - last))
    
            event.set()   # Wake worker but keep control.
            event.clear()
            event.wait()  # Yield control until woken.
    
            last = time()
    
    
    @contextmanager
    def _resource_manager(queue, delay):
        "`with` statement support for `rate_limit`."
    
        event = Event()
        queue.put((event, delay))
    
        event.wait() # Wait for queue watcher to wake us.
    
        yield
    
        event.set()  # Wake queue watcher.
    

提交回复
热议问题