Are GPU/CUDA cores SIMD ones?

心已入冬 提交于 2019-12-03 12:23:05

Each warp is assigned to only one core (is that true?).

No, it's not true. A warp is a logical assembly of 32 threads of execution. To execute a single instruction from a single warp, the warp scheduler must usually schedule 32 execution units (or "cores", although the definition of a "core" is somewhat loose).

Cores are in fact scalar processors, not vector processors. 32 cores (or execution units) are marshalled by the warp scheduler to execute a single instruction, across 32 threads, which is where the "SIMT" moniker comes from.

CUDA "cores" can be thought of as SIMD lanes.

First let's recall that the term "CUDA core" is nVIDIA marketing-speak. These are not cores the same way a CPU has cores. Similarly, "CUDA threads" are not the same as the threads we know on CPUs.

The equivalent of a CPU core on a GPU is a "symmetric multiprocessor": It has its own instruction scheduler/dispatcher, its own L1 cache, its own shared memory etc. It is CUDA thread blocks rather than warps that are assigned to a GPU core, i.e. to a streaming multiprocessor. Within an SM, warps get selected to have instructions scheduled, for the entire warp. From a CUDA perspective, those are 32 separate threads, which are instruction-locked; but that's really no different than saying that a warp is like a single thread, which only executes 32-lane-wide SIMD instructions. Of course this isn't a perfect analogy, but I feel it's pretty sound. Something you don't quite have on CPU SIMD lanes is a masking of which lanes are actively executing, where inactive lanes will have not have the effect of active lanes' setting of register values, memory writes etc.

I hope this makes intuitive sense to you (or perhaps you've figured this out yourself over the past 2 years).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!