Using grequests to make several thousand get requests to sourceforge, get “Max retries exceeded with url”

对着背影说爱祢 提交于 2019-12-01 02:13:14
Virgil

So, I'm answering here, maybe it will help others.

In my case, it was not rate limiting by the destination server, but something much simpler: I didn't explicitly close the responses, so they were keeping the socket open, and the python process ran out of file handles.

My solution (don't know for sure which one fixed the issue - theoretically either of them should) was to:

  • Set stream=False in grequests.get:

    rs = (grequests.get(u, stream=False) for u in urls)
    
  • Call explicitly response.close() after I read response.content:

    responses = grequests.map(rs)
    for response in responses:
          make_use_of(response.content)
          response.close()
    

Note: simply destroying the response object (assigning None to it, calling gc.collect()) was not enough - this did not close the file handlers.

This one can be easily changed to use whichever number of connections you want.

MAX_CONNECTIONS = 100 #Number of connections you want to limit it to
# urlsList: Your list of URLs. 

results = []
for x in range(1,pages+1, MAX_CONNECTIONS):
    rs = (grequests.get(u, stream=False) for u in urlsList[x:x+MAX_CONNECTIONS])
    time.sleep(0.2) #You can change this to whatever you see works better. 
    results.extend(grequests.map(rs)) #The key here is to extend, not append, not insert. 
    print("Waiting") #Optional, so you see something is done. 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!