I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I c
See "Formatted output" (currently B.17) section of CUDA C Programming Guide.
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
by the way..
EDIT
To avoid misleading people, as M. Tibbits points out printf is available in any GPU of compute capability 2.0 and higher.
END OF EDIT
You have choices:
Regarding your code snippet:
Matrix
structs in via pointer (i.e. cudaMemcpy
them to the device, then pass in the device pointer), right now you will have no problem but if the function signature gets very large then you may hit the 256 byte limitCUDA now supports printf
s directly in the kernel. For formal description see Appendix B.16 of the CUDA C Programming Guide.