Limiting/throttling the rate of HTTP requests in GRequests

前端 未结 4 1544
盖世英雄少女心
盖世英雄少女心 2020-12-23 14:46

I\'m writing a small script in Python 2.7.3 with GRequests and lxml that will allow me to gather some collectible card prices from various websites and compare them. Problem

4条回答
  •  春和景丽
    2020-12-23 15:27

    Doesn't look like there's any simple mechanism for handling this build in to the requests or grequests code. The only hook that seems to be around is for responses.

    Here's a super hacky work-around to at least prove it's possible - I modified grequests to keep a list of the time when a request was issued and sleep the creation of the AsyncRequest until the requests per second were below the maximum.

    class AsyncRequest(object):
        def __init__(self, method, url, **kwargs):
            print self,'init'
            waiting=True
            while waiting:
                if len([x for x in q if x > time.time()-15]) < 8:
                    q.append(time.time())
                    waiting=False
                else:
                    print self,'snoozing'
                    gevent.sleep(1)
    

    You can use grequests.imap() to watch this interactively

    import time
    import rg
    
    urls = [
            'http://www.heroku.com',
            'http://python-tablib.org',
            'http://httpbin.org',
            'http://python-requests.org',
            'http://kennethreitz.com',
            'http://www.cnn.com',
    ]
    
    def print_url(r, *args, **kwargs):
            print(r.url),time.time()
    
    hook_dict=dict(response=print_url)
    rs = (rg.get(u, hooks=hook_dict) for u in urls)
    for r in rg.imap(rs):
            print r
    

    I wish there was a more elegant solution, but so far I can't find one. Looked around in sessions and adapters. Maybe the poolmanager could be augmented instead?

    Also, I wouldn't put this code in production - the 'q' list never gets trimmed and would eventually get pretty big. Plus, I don't know if it's actually working as advertised. It just looks like it is when I look at the console output.

    Ugh. Just looking at this code I can tell it's 3am. Time to goto bed.

提交回复
热议问题