Python urllib2 does not respect timeout

白昼怎懂夜的黑 提交于 2019-12-06 06:13:42

问题


The following two lines of code hangs forever:

import urllib2
urllib2.urlopen('https://www.5giay.vn/', timeout=5)

This is with python2.7, and I have no http_proxy or any other env variables set. Any other website works fine. I can also wget the site without any issue. What could be the issue?


回答1:


If you run

import urllib2

url = 'https://www.5giay.vn/'
urllib2.urlopen(url, timeout=1.0)

wait for a few seconds, and then use C-c to interrupt the program, you'll see

  File "/usr/lib/python2.7/ssl.py", line 260, in read
    return self._sslobj.read(len)
KeyboardInterrupt

This shows that the program is hanging on self._sslobj.read(len).

SSL timeouts raise socket.timeout.

You can control the delay before socket.timeout is raised by calling socket.setdefaulttimeout(1.0).

For example,

import urllib2
import socket

socket.setdefaulttimeout(1.0)
url = 'https://www.5giay.vn/'
try:
    urllib2.urlopen(url, timeout=1.0)
except IOError as err:
    print('timeout')

% time script.py
timeout

real    0m3.629s
user    0m0.020s
sys 0m0.024s

Note that the requests module succeeds here although urllib2 did not:

import requests
r = requests.get('https://www.5giay.vn/')

How to enforce a timeout on the entire function call:

socket.setdefaulttimeout only affects how long Python waits before an exception is raised if the server has not issued a response.

Neither it nor urlopen(..., timeout=...) enforce a time limit on the entire function call.

To do that, you could use eventlets, as shown here.

If you don't want to install eventlets, you could use multiprocessing from the standard library; though this solution will not scale as well as an asynchronous solution such as the one eventlets provides.

import urllib2
import socket
import multiprocessing as mp

def timeout(t, cmd, *args, **kwds):
    pool = mp.Pool(processes=1)
    result = pool.apply_async(cmd, args=args, kwds=kwds)
    try:
        retval = result.get(timeout=t)
    except mp.TimeoutError as err:
        pool.terminate()
        pool.join()
        raise
    else:
        return retval

def open(url):
    response = urllib2.urlopen(url)
    print(response)

url = 'https://www.5giay.vn/'
try:
    timeout(5, open, url)
except mp.TimeoutError as err:
    print('timeout')

Running this will either succeed or timeout in about 5 seconds of wall clock time.



来源:https://stackoverflow.com/questions/27327787/python-urllib2-does-not-respect-timeout

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!