Changing the arch argument in CUDA makes me use more registers

戏子无情 提交于 2019-12-11 15:04:36

问题


I have been writing a kernel on my Tesla K20m, when I compile the software with -Xptas=-v I obtain the following results :

ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z9searchKMPPciPhiPiS1_' for 'sm_10'
ptxas info    : Used 8 registers, 80 bytes smem, 8 bytes cmem[1]

as you can see, only 8 registers are used, however, if I mention the argument -arch=sm_35 the time my kernel executes raises dramatically and the number of registers used too, and I am wondering why

nvcc mysoftware.cu -Xptxas=-v -arch=sm_35 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z9searchKMPPciPhiPiS1_' for 'sm_35'
ptxas info    : Function properties for _Z9searchKMPPciPhiPiS1_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 21 registers, 16 bytes smem, 368 bytes cmem[0]

Since in multiple books it was mentioned that using the right architecture for the card was suppose to improve the performances, I wonder why mine are dramatically decreasing.

thanks.

Edit : Similar Question and Answer : Registers and shared memory depending on compiling compute capability?


回答1:


Compiling with sm_20 and above enables IEEE math and ABI compliance. These two options can increase register count and decrease performance. These two options can be disabled.



来源:https://stackoverflow.com/questions/15053339/changing-the-arch-argument-in-cuda-makes-me-use-more-registers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!