Using grequests to make several thousand get requests to sourceforge, get “Max retries exceeded with url”

前端 未结 2 1206
有刺的猬
有刺的猬 2021-01-07 19:33

I am very new to all of this; I need to obtain data on several thousand sourceforge projects for a paper I am writing. The data is all freely available in json format at the

2条回答
  •  时光取名叫无心
    2021-01-07 20:21

    This one can be easily changed to use whichever number of connections you want.

    MAX_CONNECTIONS = 100 #Number of connections you want to limit it to
    # urlsList: Your list of URLs. 
    
    results = []
    for x in range(1,pages+1, MAX_CONNECTIONS):
        rs = (grequests.get(u, stream=False) for u in urlsList[x:x+MAX_CONNECTIONS])
        time.sleep(0.2) #You can change this to whatever you see works better. 
        results.extend(grequests.map(rs)) #The key here is to extend, not append, not insert. 
        print("Waiting") #Optional, so you see something is done. 
    

提交回复
热议问题