CUDA determining threads per block, blocks per grid

后端未结

关注

 4  1317

走了就别回头了 2020-12-02 06:06

I\'m new to the CUDA paradigm. My question is in determining the number of threads per block, and blocks per grid. Does a bit of art and trial play into this? What I\'ve fou

4条回答

渐次进展 (楼主)

2020-12-02 06:48

You also need to consider shared memory because threads in the same block can access the same shared memory. If you're designing something that requires a lot of shared memory, then more threads-per-block might be advantageous.

For example, in terms of context switching, any multiple of 32 works just the same. So for the 1D case, launching 1 block with 64 threads or 2 blocks with 32 threads each makes no difference for global memory accesses. However, if the problem at hand naturally decomposes into 1 length-64 vector, then the first option will be better (less memory overhead, every thread can access the same shared memory) than the second.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...