CUDA determining threads per block, blocks per grid

后端 未结 4 1309
走了就别回头了
走了就别回头了 2020-12-02 06:06

I\'m new to the CUDA paradigm. My question is in determining the number of threads per block, and blocks per grid. Does a bit of art and trial play into this? What I\'ve fou

4条回答
  •  渐次进展
    2020-12-02 06:48

    You also need to consider shared memory because threads in the same block can access the same shared memory. If you're designing something that requires a lot of shared memory, then more threads-per-block might be advantageous.

    For example, in terms of context switching, any multiple of 32 works just the same. So for the 1D case, launching 1 block with 64 threads or 2 blocks with 32 threads each makes no difference for global memory accesses. However, if the problem at hand naturally decomposes into 1 length-64 vector, then the first option will be better (less memory overhead, every thread can access the same shared memory) than the second.

提交回复
热议问题