I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I c
CUDA now supports printfs directly in the kernel. For formal description see Appendix B.16 of the CUDA C Programming Guide.
printf