How can I speed up fetching pages with urllib2 in python?

后端 未结 11 1133
野的像风
野的像风 2020-11-28 03:28

I have a script that fetches several web pages and parses the info.

(An example can be seen at http://bluedevilbooks.com/search/?DEPT=MATH&CLASS=103&SEC=01 )

11条回答
  •  自闭症患者
    2020-11-28 04:05

    Please find Python network benchmark script for single connection slowness identification:

    """Python network test."""
    from socket import create_connection
    from time import time
    
    try:
        from urllib2 import urlopen
    except ImportError:
        from urllib.request import urlopen
    
    TIC = time()
    create_connection(('216.58.194.174', 80))
    print('Duration socket IP connection (s): {:.2f}'.format(time() - TIC))
    
    TIC = time()
    create_connection(('google.com', 80))
    print('Duration socket DNS connection (s): {:.2f}'.format(time() - TIC))
    
    TIC = time()
    urlopen('http://216.58.194.174')
    print('Duration urlopen IP connection (s): {:.2f}'.format(time() - TIC))
    
    TIC = time()
    urlopen('http://google.com')
    print('Duration urlopen DNS connection (s): {:.2f}'.format(time() - TIC))
    

    And example of results with Python 3.6:

    Duration socket IP connection (s): 0.02
    Duration socket DNS connection (s): 75.51
    Duration urlopen IP connection (s): 75.88
    Duration urlopen DNS connection (s): 151.42
    

    Python 2.7.13 has very similar results.

    In this case, DNS and urlopen slowness are easily identified.

提交回复
热议问题