Long running workers blocking GIL timeout errors

耗尽温柔 提交于 2021-02-18 18:55:47

问题


I'm using dask-distributed with a local setup (LocalCluster with 5 workers) on a dask.delayed workload. Most of the work is done by the vtk Python bindings. Since vtk is C++ based I think that means the workers don't release the GIL when in a long-running statement. When I run the workload, my terminal prints out a bunch of errors like this:

Traceback (most recent call last):
  File "C:\Users\patri\AppData\Local\Continuum\anaconda3\lib\site-packages\distributed\comm\core.py", line 221, in connect
    _raise(error)
  File "C:\Users\patri\AppData\Local\Continuum\anaconda3\lib\site-packages\distributed\comm\core.py", line 204, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://127.0.0.1:49721' after 10 s: connect() didn't finish in time

My workload continues fine however - I get a bunch of errors on the command line but it keeps chugging along. So I think the workers aren't crashing, but the heartbeat communication stops. Since I don't want to mess with vtk internals to release the GIL, how can I fix the errors? I get so many of these benign timeout errors that I can't see any real errors that might happen.


回答1:


Release the GIL temporally by sleeping the VTK event loop thread. If you are using a vtkWindowRendererInteractor instance, create a timer with a callback which sleeps the execution a bit using the sleep builtin.



来源:https://stackoverflow.com/questions/60019241/long-running-workers-blocking-gil-timeout-errors

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!