Threading in python doesn't happen parallel

前端 未结 2 1325
孤街浪徒
孤街浪徒 2020-12-04 01:28

I\'m doing data scraping calls with an urllib2, yet they each take around 1 seconds to complete. I was trying to test if I could multi-thread the URL-call loop into threadin

相关标签:
2条回答
  • 2020-12-04 01:42

    To get multiple urls in parallel limiting to 20 connections at a time:

    import urllib2
    from multiprocessing.dummy import Pool
    
    def generate_urls(): # generate some dummy urls
        for i in range(100):
            yield 'http://example.com?param=%d' % i
    
    def get_url(url):
        try: return url, urllib2.urlopen(url).read(), None
        except EnvironmentError as e:
             return url, None, e
    
    pool = Pool(20) # limit number of concurrent connections
    for url, result, error in pool.imap_unordered(get_url, generate_urls()):
        if error is None:
           print result,
    
    0 讨论(0)
  • 2020-12-04 02:02

    Paul Seeb has correctly diagnosed your issue.

    You are calling trade.update_items, and then passing the result to the threading.Thread constructor. Thus, you get serial behavior: your threads don't do any work, and the creation of each one is delayed until the update_items call returns.

    The correct form is threading.Thread(target=trade.update_items, args=(1, 100) for the first line, and similarly for the later ones. This will pass the update_items function as the thread entry point, and the *[1, 100] as its positional arguments.

    0 讨论(0)
提交回复
热议问题