CUDA program causes nvidia driver to crash

为君一笑 提交于 2019-11-27 09:09:49

If smaller numbers of trials work correctly, and if you are running on MS Windows without the NVIDIA Tesla Compute Cluster (TCC) driver and/or the GPU you are using is attached to a display, then you are probably exceeding the operating system's "watchdog" timeout. If the kernel occupies the display device (or any GPU on Windows without TCC) for too long, the OS will kill the kernel so that the system does not become non-interactive.

The solution is to run on a non-display-attached GPU and if you are on Windows, use the TCC driver. Otherwise, you will need to reduce the number of trials in your kernel and run the kernel multiple times to compute the number of trials you need.

EDIT: According to the CUDA 4.0 curand docs(page 15, "Performance Notes"), you can improve performance by copying the state for a generator to local storage inside your kernel, then storing the state back (if you need it again) when you are finished:

curandState state = states[i];

for(k = 0; k < trials; k++){
    x = curand_uniform(&state);
    y = curand_uniform(&state);
    z =(x*x + y*y);
    if (z <= 1.0f) incircle++;
}

Next, it mentions that setup is expensive, and suggests that you move curand_init into a separate kernel. This may help keep the cost of your MC kernel down so you don't run up against the watchdog.

I recommend reading that section of the docs, there are several useful guidelines.

Michal Hosala

For those of you having a geforce GPU which does not support TCC driver there is another solution based on:

http://msdn.microsoft.com/en-us/library/windows/hardware/ff569918(v=vs.85).aspx

  1. start regedit,
  2. navigate to HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
  3. create new DWORD key called TdrLevel, set value to 0,
  4. restart PC.

Now your long-running kernels should not be terminated. This answer is based on:

Modifying registry to increase GPU timeout, windows 7

I just thought it might be useful to provide the solution here as well.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!