How to use multiprocessing to loop through a big list of URL?

前端 未结 2 1463
我寻月下人不归
我寻月下人不归 2020-12-16 08:36

Problem: Check a listing of over 1000 urls and get the url return code (status_code).

The script I have works but very slow.

I am thinking there has to be a be

2条回答
  •  我在风中等你
    2020-12-16 09:12

    In checkurlconnection function, parameter must be urls not url. else, in the for loop, urls will point to the global variable, which is not what you want.

    import requests
    from multiprocessing import Pool
    
    with open("url10.txt") as f:
        urls = f.read().splitlines()
    
    def checkurlconnection(urls):
        for url in urls:
            url =  'http://'+url
            try:
                resp = requests.get(url, timeout=1)
                print(len(resp.content), '->', resp.status_code, '->', resp.url)
            except Exception as e:
                print("Error", url)
    
    if __name__ == "__main__":
        p = Pool(processes=4)
        result = p.map(checkurlconnection, urls)
    

提交回复
热议问题