Is there really a timeout for kernels on nvidia gpus?

感情迁移 提交于 2019-12-10 16:59:02

问题


searching for answers for why my kernels produce strange error messages or "0" only results I found this answer on SO that mentions that there is a timeout of 5s for kernels running on nvidia gpus? I googled for the timout but I could not find confirming sources or more information.

What do you know about it?

Could the timout cause strange behaviour for kernels with a long runtime?

Thanks!


回答1:


Further googling brought up this in the CUDA_Toolkit_Release_Notes_Linux.txt (Known Issus):

# Individual GPU program launches are limited to a run time of less than 5 seconds on a GPU with a display attached. Exceeding this time limit usually causes a launch failure reported through the CUDA driver or the CUDA runtime. GPUs without a display attached are not subject to the 5 second runtime restriction. For this reason it is recommended that CUDA be run on a GPU that is NOT attached to a display and does not have the Windows desktop extended onto it. In this case, the system must contain at least one NVIDIA GPU that serves as the primary graphics adapter.

[update] It seems that the official name for this feature is 'watchdog'.




回答2:


If you're on Windows Vista or later, the WDDM driver stack will automatically reset the device after about two seconds unless you tweak your TDR timeouts. (Windows can't tell the difference between a GPU running a lengthy kernel and a GPU that's locked up.) Tesla-branded cards running in TCC mode aren't subject to the normal display adapter restrictions and can therefore run longer kernels.



来源:https://stackoverflow.com/questions/5117961/is-there-really-a-timeout-for-kernels-on-nvidia-gpus

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!