Is there really a timeout for kernels on nvidia gpus?

问题

searching for answers for why my kernels produce strange error messages or "0" only results I found this answer on SO that mentions that there is a timeout of 5s for kernels running on nvidia gpus? I googled for the timout but I could not find confirming sources or more information.

What do you know about it?

Could the timout cause strange behaviour for kernels with a long runtime?

Thanks!

回答1:

Further googling brought up this in the CUDA_Toolkit_Release_Notes_Linux.txt (Known Issus):

# Individual GPU program launches are limited to a run time of less than 5 seconds on a GPU with a display attached. Exceeding this time limit usually causes a launch failure reported through the CUDA driver or the CUDA runtime. GPUs without a display attached are not subject to the 5 second runtime restriction. For this reason it is recommended that CUDA be run on a GPU that is NOT attached to a display and does not have the Windows desktop extended onto it. In this case, the system must contain at least one NVIDIA GPU that serves as the primary graphics adapter.

[update] It seems that the official name for this feature is 'watchdog'.

回答2:

If you're on Windows Vista or later, the WDDM driver stack will automatically reset the device after about two seconds unless you tweak your TDR timeouts. (Windows can't tell the difference between a GPU running a lengthy kernel and a GPU that's locked up.) Tesla-branded cards running in TCC mode aren't subject to the normal display adapter restrictions and can therefore run longer kernels.

来源：https://stackoverflow.com/questions/5117961/is-there-really-a-timeout-for-kernels-on-nvidia-gpus

标签

opencl

nvidia

gpu-programming