I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I c
See "Formatted output" (currently B.17) section of CUDA C Programming Guide.
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html