Interpreting the verbose output of ptxas, part I

北慕城南 提交于 2019-11-28 21:22:12
  • Each CUDA thread is using 46 registers? Yes, correct
  • There is no register spilling to local memory? Yes, correct
  • Is 72 bytes the sum-total of the memory for the stack frames of the __global__ and __device__ functions? Yes, correct
  • What is the difference between 0 byte spill stores and 0 bytes spill loads?
    • Fair question, the loads could be greater than the stores since you could spill a computed value, load it once, discard it (i.e. store something else into that register) then load it again (i.e. reuse it). Update: note also that the spill load/store count is based on static analysis as described by @njuffa in the comments below
  • Why is the information for cmem (which I am assuming is constant memory) repeated twice with different figures? Within the kernel I am not using any constant memory. Does that mean the compiler is, under the hood, going to tell the GPU to use some constant memory?
    • Constant memory is used for a few purposes including __constant__ variables and kernel arguments, different "banks" are used, that starts to get a bit detailed but as long as you use less than 64KB for your __constant__ variables and less than 4KB for kernel arguments you will be ok.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!