What is the fastest way to send 100,000 HTTP requests in Python?

前端 未结 16 1200
暖寄归人
暖寄归人 2020-11-22 07:12

I am opening a file which has 100,000 URL\'s. I need to send an HTTP request to each URL and print the status code. I am using Python 2.6, and so far looked at the many con

16条回答
  •  一整个雨季
    2020-11-22 07:43

    Things have changed quite a bit since 2010 when this was posted and I haven't tried all the other answers but I have tried a few, and I found this to work the best for me using python3.6.

    I was able to fetch about ~150 unique domains per second running on AWS.

    import pandas as pd
    import concurrent.futures
    import requests
    import time
    
    out = []
    CONNECTIONS = 100
    TIMEOUT = 5
    
    tlds = open('../data/sample_1k.txt').read().splitlines()
    urls = ['http://{}'.format(x) for x in tlds[1:]]
    
    def load_url(url, timeout):
        ans = requests.head(url, timeout=timeout)
        return ans.status_code
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=CONNECTIONS) as executor:
        future_to_url = (executor.submit(load_url, url, TIMEOUT) for url in urls)
        time1 = time.time()
        for future in concurrent.futures.as_completed(future_to_url):
            try:
                data = future.result()
            except Exception as exc:
                data = str(type(exc))
            finally:
                out.append(data)
    
                print(str(len(out)),end="\r")
    
        time2 = time.time()
    
    print(f'Took {time2-time1:.2f} s')
    print(pd.Series(out).value_counts())
    

提交回复
热议问题