How can I speed up fetching pages with urllib2 in python?

后端未结

关注

 11  1149

野的像风 2020-11-28 03:28

I have a script that fetches several web pages and parses the info.

(An example can be seen at http://bluedevilbooks.com/search/?DEPT=MATH&CLASS=103&SEC=01 )

11条回答

萌比男神i (楼主)

2020-11-28 03:56

Here's a standard library solution. It's not quite as fast, but it uses less memory than the threaded solutions.

try:
    from http.client import HTTPConnection, HTTPSConnection
except ImportError:
    from httplib import HTTPConnection, HTTPSConnection
connections = []
results = []

for url in urls:
    scheme, _, host, path = url.split('/', 3)
    h = (HTTPConnection if scheme == 'http:' else HTTPSConnection)(host)
    h.request('GET', '/' + path)
    connections.append(h)
for h in connections:
    results.append(h.getresponse().read())

Also, if most of your requests are to the same host, then reusing the same http connection would probably help more than doing things in parallel.

0 讨论(0)

查看其它11个回答