Registers and shared memory depending on compiling compute capability?

三世轮回 提交于 2019-12-13 15:27:52

问题


Hey there, when I compile with nvcc -arch=sm_13 I get:

ptxas info    : Used 29 registers, 28+16 bytes smem, 7200 bytes cmem[0], 8 bytes cmem[1] 

when I use nvcc -arch=sm_20 I get:

ptxas info    : Used 34 registers, 60 bytes cmem[0], 7200 bytes cmem[2], 4 bytes cmem[16] 

I thought all the kernel parameters are passed to shared memory but for sm_20, it doesn't seem so...?! Perhaps they are also passed into registers? The head of my function looks like the following:

__global__ void func(double *, double , double, int)

Thanks so far!


回答1:


In compute capability 2.x devices, arguments to kernels are stored in constant memory. The register difference is probably down to differences in the code generated for math library functions between versions. Are there things like transcendental functions or sqrt in the kernel?




回答2:


As @talonmies states, shared memory differences are due to SM 2.x devices passing kernel arguments via constant rather than shared memory.

However one of the main differences in register usage in SM 2.x devices is the fact that while SM 1.x devices have dedicated address registers for load and store instructions, SM 2.x uses general-purpose registers for addresses. This tends to increase register pressure on SM 2.x. Luckily the register file is also 2x larger on GF100 (SM 2.0) vs. GT200 (SM 1.3).



来源:https://stackoverflow.com/questions/6038221/registers-and-shared-memory-depending-on-compiling-compute-capability

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!