问题
The following two lines of code hangs forever:
import urllib2
urllib2.urlopen('https://www.5giay.vn/', timeout=5)
This is with python2.7, and I have no http_proxy or any other env variables set. Any other website works fine. I can also wget the site without any issue. What could be the issue?
回答1:
If you run
import urllib2
url = 'https://www.5giay.vn/'
urllib2.urlopen(url, timeout=1.0)
wait for a few seconds, and then use C-c to interrupt the program, you'll see
File "/usr/lib/python2.7/ssl.py", line 260, in read
return self._sslobj.read(len)
KeyboardInterrupt
This shows that the program is hanging on self._sslobj.read(len).
SSL timeouts raise socket.timeout.
You can control the delay before socket.timeout is raised by calling
socket.setdefaulttimeout(1.0).
For example,
import urllib2
import socket
socket.setdefaulttimeout(1.0)
url = 'https://www.5giay.vn/'
try:
urllib2.urlopen(url, timeout=1.0)
except IOError as err:
print('timeout')
% time script.py
timeout
real 0m3.629s
user 0m0.020s
sys 0m0.024s
Note that the requests module succeeds here although urllib2 did not:
import requests
r = requests.get('https://www.5giay.vn/')
How to enforce a timeout on the entire function call:
socket.setdefaulttimeout only affects how long Python waits before an exception is raised if the server has not issued a response.
Neither it nor urlopen(..., timeout=...) enforce a time limit on the entire function call.
To do that, you could use eventlets, as shown here.
If you don't want to install eventlets, you could use multiprocessing from the standard library; though this solution will not scale as well as an asynchronous solution such as the one eventlets provides.
import urllib2
import socket
import multiprocessing as mp
def timeout(t, cmd, *args, **kwds):
pool = mp.Pool(processes=1)
result = pool.apply_async(cmd, args=args, kwds=kwds)
try:
retval = result.get(timeout=t)
except mp.TimeoutError as err:
pool.terminate()
pool.join()
raise
else:
return retval
def open(url):
response = urllib2.urlopen(url)
print(response)
url = 'https://www.5giay.vn/'
try:
timeout(5, open, url)
except mp.TimeoutError as err:
print('timeout')
Running this will either succeed or timeout in about 5 seconds of wall clock time.
来源:https://stackoverflow.com/questions/27327787/python-urllib2-does-not-respect-timeout