Where does CUDA allocate the stack frame for kernels?

后端 未结 2 1509
眼角桃花
眼角桃花 2020-12-19 06:41

My kernel call fails with \"out of memory\". It makes significant usage of the stack frame and I was wondering if this is the reason for its failure.

When invoking n

2条回答
  •  余生分开走
    2020-12-19 07:19

    Stack is allocated in local memory. Allocation is per physical thread (GTX480: 15 SM * 1536 threads/SM = 23040 threads). You are requesting 150,352 bytes/thread => ~3.4 GB of stack space. CUDA may reduce the maximum physical threads per launch if the size is that high. The CUDA language is not designed to have a large per thread stack.

    In terms of registers GTX480 is limited to 63 registers per thread and 32K registers per SM.

提交回复
热议问题