发表新帖

发表新帖

Streaming multiprocessors, Blocks and Threads (CUDA)

后端未结

关注

 4  2014

走了就别回头了 2020-12-04 06:10

What is the relationship between a CUDA core, a streaming multiprocessor and the CUDA model of blocks and threads?

What gets mapped to what and what is parallelized

4条回答

陌清茗 (楼主)

2020-12-04 06:43

The thread / block layout is described in detail in the CUDA programming guide. In particular, chapter 4 states:

The CUDA architecture is built around a scalable array of multithreaded Streaming Multiprocessors (SMs). When a CUDA program on the host CPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available execution capacity. The threads of a thread block execute concurrently on one multiprocessor, and multiple thread blocks can execute concurrently on one multiprocessor. As thread blocks terminate, new blocks are launched on the vacated multiprocessors.

Each SM contains 8 CUDA cores, and at any one time they're executing a single warp of 32 threads - so it takes 4 clock cycles to issue a single instruction for the whole warp. You can assume that threads in any given warp execute in lock-step, but to synchronise across warps, you need to use __syncthreads().

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题