I\'m doing data scraping calls with an urllib2, yet they each take around 1 seconds to complete. I was trying to test if I could multi-thread the URL-call loop into threadin
To get multiple urls in parallel limiting to 20 connections at a time:
import urllib2
from multiprocessing.dummy import Pool
def generate_urls(): # generate some dummy urls
for i in range(100):
yield 'http://example.com?param=%d' % i
def get_url(url):
try: return url, urllib2.urlopen(url).read(), None
except EnvironmentError as e:
return url, None, e
pool = Pool(20) # limit number of concurrent connections
for url, result, error in pool.imap_unordered(get_url, generate_urls()):
if error is None:
print result,