Streaming multiprocessors, Blocks and Threads (CUDA)

后端 未结 4 2014
走了就别回头了
走了就别回头了 2020-12-04 06:10

What is the relationship between a CUDA core, a streaming multiprocessor and the CUDA model of blocks and threads?

What gets mapped to what and what is parallelized

4条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-04 06:43

    The thread / block layout is described in detail in the CUDA programming guide. In particular, chapter 4 states:

    The CUDA architecture is built around a scalable array of multithreaded Streaming Multiprocessors (SMs). When a CUDA program on the host CPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available execution capacity. The threads of a thread block execute concurrently on one multiprocessor, and multiple thread blocks can execute concurrently on one multiprocessor. As thread blocks terminate, new blocks are launched on the vacated multiprocessors.

    Each SM contains 8 CUDA cores, and at any one time they're executing a single warp of 32 threads - so it takes 4 clock cycles to issue a single instruction for the whole warp. You can assume that threads in any given warp execute in lock-step, but to synchronise across warps, you need to use __syncthreads().

提交回复
热议问题