CUDA Matrix multiplication breaks for large matrices
问题 I have the following matrix multiplication code, implemented using CUDA 3.2 and VS 2008. I am running on Windows server 2008 r2 enterprise. I am running a Nvidia GTX 480. The following code works fine with values of "Width" (Matrix width) up to about 2500 or so. int size = Width*Width*sizeof(float); float* Md, *Nd, *Pd; cudaError_t err = cudaSuccess; //Allocate Device Memory for M, N and P err = cudaMalloc((void**)&Md, size); err = cudaMalloc((void**)&Nd, size); err = cudaMalloc((void**)&Pd,