问题
Hey there,
when I compile with nvcc -arch=sm_13 I get:
ptxas info : Used 29 registers, 28+16 bytes smem, 7200 bytes cmem[0], 8 bytes cmem[1]
when I use nvcc -arch=sm_20 I get:
ptxas info : Used 34 registers, 60 bytes cmem[0], 7200 bytes cmem[2], 4 bytes cmem[16]
I thought all the kernel parameters are passed to shared memory but for sm_20, it doesn't seem so...?! Perhaps they are also passed into registers? The head of my function looks like the following:
__global__ void func(double *, double , double, int)
Thanks so far!
回答1:
In compute capability 2.x devices, arguments to kernels are stored in constant memory. The register difference is probably down to differences in the code generated for math library functions between versions. Are there things like transcendental functions or sqrt in the kernel?
回答2:
As @talonmies states, shared memory differences are due to SM 2.x devices passing kernel arguments via constant rather than shared memory.
However one of the main differences in register usage in SM 2.x devices is the fact that while SM 1.x devices have dedicated address registers for load and store instructions, SM 2.x uses general-purpose registers for addresses. This tends to increase register pressure on SM 2.x. Luckily the register file is also 2x larger on GF100 (SM 2.0) vs. GT200 (SM 1.3).
来源:https://stackoverflow.com/questions/6038221/registers-and-shared-memory-depending-on-compiling-compute-capability