printf inside CUDA global function

后端未结

关注

 4  647

I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I c

相关标签:

4条回答

半阙折子戏

2020-12-29 22:08

See "Formatted output" (currently B.17) section of CUDA C Programming Guide.

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
盖世英雄少女心

2020-12-29 22:16
- cuprintf
- try Nexus http://developer.nvidia.com/object/nexus.html
by the way..
- use shared memory
- multiply outside of the loop
- Look at this: http://www.seas.upenn.edu/~cis665/LECTURES/Lecture11.ppt
0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2020-12-29 22:17
EDIT

To avoid misleading people, as M. Tibbits points out printf is available in any GPU of compute capability 2.0 and higher.

END OF EDIT

You have choices:
- Use a GPU debugger, i.e. cuda-gdb on Linux or Nexus on Windows
- Use cuprintf, which is available for registered developers (sign up here)
- Manually copy the data that you want to see, then dump that buffer on the host after your kernel has completed (remember to synchronise)
Regarding your code snippet:
- Consider passing the Matrix structs in via pointer (i.e. cudaMemcpy them to the device, then pass in the device pointer), right now you will have no problem but if the function signature gets very large then you may hit the 256 byte limit
- You have inefficient reads from Ad, you will have a 32-byte transaction to the memory for each read into Melement - consider using shared memory as a staging area (c.f. the transposeNew sample in the SDK)
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-12-29 22:25

CUDA now supports printfs directly in the kernel. For formal description see Appendix B.16 of the CUDA C Programming Guide.

0 讨论(0)
发布评论:

提交评论
- 加载中...

printf inside CUDA __global__ function

printf inside CUDA global function