I am new to CUDA. When I multiply the 1024x1024 matrix, and launch a kernel with:
multiplyKernel << > >(d
On Windows, I right clicked the NSight monitor icon in the system tray. There I chose Options>General. We see WDDM TDR delay. It was at 2, and I increased it to 10. Then, I ran my program again, and it worked fine. This was according to Robert's link (see above) http://http.developer.nvidia.com/NsightVisualStudio/2.2/Documentation/UserGuide/HTML/Content/Timeout_Detection_Recovery.htm