CUDA program causes nvidia driver to crash

后端 未结 2 1566
遇见更好的自我
遇见更好的自我 2020-12-01 20:35

My monte carlo pi calculation CUDA program is causing my nvidia driver to crash when I exceed around 500 trials and 256 full blocks. It seems to be happening in the monteCar

2条回答
  •  长情又很酷
    2020-12-01 20:52

    If smaller numbers of trials work correctly, and if you are running on MS Windows without the NVIDIA Tesla Compute Cluster (TCC) driver and/or the GPU you are using is attached to a display, then you are probably exceeding the operating system's "watchdog" timeout. If the kernel occupies the display device (or any GPU on Windows without TCC) for too long, the OS will kill the kernel so that the system does not become non-interactive.

    The solution is to run on a non-display-attached GPU and if you are on Windows, use the TCC driver. Otherwise, you will need to reduce the number of trials in your kernel and run the kernel multiple times to compute the number of trials you need.

    EDIT: According to the CUDA 4.0 curand docs(page 15, "Performance Notes"), you can improve performance by copying the state for a generator to local storage inside your kernel, then storing the state back (if you need it again) when you are finished:

    curandState state = states[i];
    
    for(k = 0; k < trials; k++){
        x = curand_uniform(&state);
        y = curand_uniform(&state);
        z =(x*x + y*y);
        if (z <= 1.0f) incircle++;
    }
    

    Next, it mentions that setup is expensive, and suggests that you move curand_init into a separate kernel. This may help keep the cost of your MC kernel down so you don't run up against the watchdog.

    I recommend reading that section of the docs, there are several useful guidelines.

提交回复
热议问题