How many threads per core are assumed when calculating GFLOPS of Nvidia GPU cards?

后端 未结 2 658
情话喂你
情话喂你 2021-01-24 22:36

I am interested in obtaining the number of nano seconds it would take to execute 1 double precision FLOP on GeForce GTX 550 Ti.

In order to do that I am following this

2条回答
  •  梦谈多话
    2021-01-24 23:23

    Compute capability 2.1 devices has a double precision throughput of 4 operations per cycle (8 if doing DFMA). This assumes all 32 threads are active in the dispatched warp.

    4 ops/cycle/SM * 4 SMs * 1800 MHz * 2 ops/DFMA = 56 GFLOPS double

    The calculation assumes all threads in a warp are active.

    The code in your question contains two dependent operations that could be fused into a DFMA. Use cuobjdump -sass to examine the assembly. If you launch multiple warps on the same SM the test turns into a measure of dependent instruction throughput not latency.

提交回复
热议问题