I\'m doing data scraping calls with an urllib2, yet they each take around 1 seconds to complete. I was trying to test if I could multi-thread the URL-call loop into threadin
To get multiple urls in parallel limiting to 20 connections at a time:
import urllib2
from multiprocessing.dummy import Pool
def generate_urls(): # generate some dummy urls
for i in range(100):
yield 'http://example.com?param=%d' % i
def get_url(url):
try: return url, urllib2.urlopen(url).read(), None
except EnvironmentError as e:
return url, None, e
pool = Pool(20) # limit number of concurrent connections
for url, result, error in pool.imap_unordered(get_url, generate_urls()):
if error is None:
print result,
Paul Seeb has correctly diagnosed your issue.
You are calling trade.update_items
, and then passing the result to the threading.Thread
constructor. Thus, you get serial behavior: your threads don't do any work, and the creation of each one is delayed until the update_items
call returns.
The correct form is threading.Thread(target=trade.update_items, args=(1, 100)
for the first line, and similarly for the later ones. This will pass the update_items
function as the thread entry point, and the *[1, 100]
as its positional arguments.