Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar? [closed]

ぐ巨炮叔叔 提交于 2020-01-01 12:07:08

问题


In the documentation for CUDA 6.5 has written: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3PIXMTktb

5.2.3. Multiprocessor Level

...

  • 8L for devices of compute capability 3.x since a multiprocessor issues a pair of instructions per warp over one clock cycle for four warps at a time, as mentioned in Compute Capability 3.x.

Does this mean that the GPU Kepler CC3.0 processors are not only pipelined architecture, but also superscalar?

  1. Pipelining - these two sequences execute in parallel (different operations at one time):

    • LOAD [addr1] -> ADD -> STORE [addr1] -> NOP
    • NOP -> LOAD [addr2] -> ADD -> STORE [addr2]
  2. Superscalar - these two sequences execute in parallel (the same operations at one time):

    • LOAD [reg1] -> ADD -> STORE [reg1]
    • LOAD [reg2] -> ADD -> STORE [reg2]

回答1:


Yes, the warp schedulers in Kepler can schedule two instructions per clock, as long as:

  1. the instructions are independent
  2. the instructions come from the same warp
  3. there are sufficient execution resources in the SM for both instructions

If that fits your definition of superscalar, then it is superscalar.

With respect to pipelining, I view pipelining differently. Various execution units in Kepler SM are pipelined. Let's take a floating point multiply as an example.

In a given clock, a Kepler warp scheduler may schedule a floating point multiply operation on a floating-point unit. The results of this operation may not appear for some number of clocks later, (i.e. they are not available on the next clock cycle) but on the next clock cycle, a new floating point operation can be scheduled on the very same floating point functional units, because the hardware (floating point units, in this case) is pipelined.

clock    operation    pipeline stage   result
0           MPY1   ->   PS1
1                       PS2
...                     ...
N-1                     PSN         ->  result1

on the very next clock after clock 0, a new multiply instruction can be scheduled on the same HW, and the corresponding result will appear on the next cycle after result1 appears.

Not sure if this is what you meant by "different operations at one time"



来源:https://stackoverflow.com/questions/28032470/are-gpu-kepler-cc3-0-processors-not-only-pipelined-architecture-but-also-supers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!