Amount of local memory per CUDA thread

后端 未结 1 647
温柔的废话
温柔的废话 2020-12-17 04:47

I read in NVIDIA documentation (http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications, table #12) that the amount of local me

相关标签:
1条回答
  • 2020-12-17 05:12

    It seems you are running into not a local memory limitation but a stack size limitation:

    ptxas info : Function properties for _Z19kernel_test_privatePc

    65000 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

    The variable that you had intended to be local is on the (GPU thread) stack, in this case.

    Based on the information provided by @njuffa here, the available stack size limit is the lesser of:

    1. The maximum local memory size (512KB for cc2.x and higher)
    2. GPU memory/(#of SMs)/(max threads per SM)

    Clearly, the first limit is not the issue. I assume you have a "standard" GTX580, which has 1.5GB memory and 16 SMs. A cc2.x device has a maximum of 1536 resident threads per multiprocessor. This means we have 1536MB/16/1536 = 1MB/16 = 65536 bytes stack. There is some overhead and other memory usage that subtracts from the total available memory, so the stack size limit is some amount below 65536, somewhere between 60000 and 65000 in your case, apparently.

    I suspect a similar calculation on your GTX770 would yield a similar result, i.e. a maximum stack size between 200000 and 250000.

    0 讨论(0)
提交回复
热议问题