I have a script that fetches several web pages and parses the info.
(An example can be seen at http://bluedevilbooks.com/search/?DEPT=MATH&CLASS=103&SEC=01 )
Here's a standard library solution. It's not quite as fast, but it uses less memory than the threaded solutions.
try:
from http.client import HTTPConnection, HTTPSConnection
except ImportError:
from httplib import HTTPConnection, HTTPSConnection
connections = []
results = []
for url in urls:
scheme, _, host, path = url.split('/', 3)
h = (HTTPConnection if scheme == 'http:' else HTTPSConnection)(host)
h.request('GET', '/' + path)
connections.append(h)
for h in connections:
results.append(h.getresponse().read())
Also, if most of your requests are to the same host, then reusing the same http connection would probably help more than doing things in parallel.