How can I speed up fetching pages with urllib2 in python?

后端 未结 11 1149
野的像风
野的像风 2020-11-28 03:28

I have a script that fetches several web pages and parses the info.

(An example can be seen at http://bluedevilbooks.com/search/?DEPT=MATH&CLASS=103&SEC=01 )

11条回答
  •  萌比男神i
    2020-11-28 03:56

    Here's a standard library solution. It's not quite as fast, but it uses less memory than the threaded solutions.

    try:
        from http.client import HTTPConnection, HTTPSConnection
    except ImportError:
        from httplib import HTTPConnection, HTTPSConnection
    connections = []
    results = []
    
    for url in urls:
        scheme, _, host, path = url.split('/', 3)
        h = (HTTPConnection if scheme == 'http:' else HTTPSConnection)(host)
        h.request('GET', '/' + path)
        connections.append(h)
    for h in connections:
        results.append(h.getresponse().read())
    

    Also, if most of your requests are to the same host, then reusing the same http connection would probably help more than doing things in parallel.

提交回复
热议问题