问题
Waiting on a concurrent.futures.Future
from a ThreadPoolExecutor
in a Tornado coroutine sometimes hangs for me:
def slowinc(x):
time.sleep(0.1)
return x + 1
yield tp_executor.submit(slowinc, 1) # sometimes hangs
When I add a timeout it might hang for the timeout period, but seems to actually return the correct result after that time.
yield gen.with_timeout(timedelta(seconds=5),
executor.submit(...)) # hangs for 5sec, then works
This only happens in the context of a larger test suite, in which many unpleasant things occur (child processes are terminated mid-stride). This error is almost certainly related to this unpleasant context and not strictly the fault of Tornado. However, I believe that I have isolated these unpleasant things as well as possible.
So I apologize for the subtle bug and lack of a simple repeatable example. My hope is that the odd behavior of a failing timeout actually causing success is helpful in isolating my issue.
Temporary Solution
So far my solution is the following:
while not future.done():
try:
yield gen.with_timeout(timedelta(seconds=1), future)
except gen.TimeoutError:
pass
result = future.result()
This solves my immediate problem and, other than the occasional one second delay, is perfectly serviceable. I'm still confused by the behavior though and am curious what odd things I'm doing to trigger this.
Update
The timeout solution above works for Python 3.4 under both Tornado 4.2 and 4.3 but neither the timeout
solution nor the PeriodicCallback
solution presented below in @ben-darnell 's answer resolves this problem under Python 2.7 Tornado 4.2 or 4.3.
回答1:
I don't have a complete solution but I think I can offer a simpler workaround: start up a background PeriodicCallback
that does nothing in a short interval: PeriodicCallback(lambda: None, 500).start()
. This will make sure the IOLoop wakes up periodically without intruding into all your yield executor.submit()
calls.
The symptom suggests that the problem lies in the "waker" behavior of add_callback
: https://github.com/tornadoweb/tornado/blob/d9c5bc8fb6530a03ebbb6da667e26685b8eee0ea/tornado/ioloop.py#L929-L944
This code was changed in Tornado 4.3 (https://github.com/tornadoweb/tornado/pull/1511/files). If you're on 4.3, see if the problem still exists in 4.2. Could anything in your "unpleasant" environment be causing thread.get_ident()
to behave differently than tornado expects?
There are reports of (rare) problems with the waker "pipe" on windows: https://github.com/tornadoweb/tornado/pull/1364
来源:https://stackoverflow.com/questions/33634956/why-would-a-timeout-avoid-a-tornado-hang