Where does CUDA allocate the stack frame for kernels?

后端 未结 2 1495
眼角桃花
眼角桃花 2020-12-19 06:41

My kernel call fails with \"out of memory\". It makes significant usage of the stack frame and I was wondering if this is the reason for its failure.

When invoking n

2条回答
  •  温柔的废话
    2020-12-19 07:03

    Stack frame is most likely in the local memory.

    I believe there is some limitation of the local memory usage, but even without it, I think CUDA driver might allocate more local memory than just for one thread in your <<<1,1>>> launch configuration.

    One way or another, even if you manage to actually run your code, I fear it may be actually quite slow because of all those stack operations. Try to reduce the number of function calls (e.g. by inlining those functions).

提交回复
热议问题