Using grequests to make several thousand get requests to sourceforge, get “Max retries exceeded with url”

前端 未结 2 1213
有刺的猬
有刺的猬 2021-01-07 19:33

I am very new to all of this; I need to obtain data on several thousand sourceforge projects for a paper I am writing. The data is all freely available in json format at the

2条回答
  •  庸人自扰
    2021-01-07 20:24

    In my case, it was not rate limiting by the destination server, but something much simpler: I didn't explicitly close the responses, so they were keeping the socket open, and the python process ran out of file handles.

    My solution (don't know for sure which one fixed the issue - theoretically either of them should) was to:

    • Set stream=False in grequests.get:

       rs = (grequests.get(u, stream=False) for u in urls)
      
    • Call explicitly response.close() after I read response.content:

       responses = grequests.map(rs)
       for response in responses:
             make_use_of(response.content)
             response.close()
      

    Note: simply destroying the response object (assigning None to it, calling gc.collect()) was not enough - this did not close the file handles.

提交回复
热议问题