发表新帖

发表新帖

How many threads per core are assumed when calculating GFLOPS of Nvidia GPU cards?

后端未结

关注

 2  667

情话喂你 2021-01-24 22:36

I am interested in obtaining the number of nano seconds it would take to execute 1 double precision FLOP on GeForce GTX 550 Ti.

In order to do that I am following this

2条回答

梦谈多话 (楼主)

2021-01-24 23:23

Compute capability 2.1 devices has a double precision throughput of 4 operations per cycle (8 if doing DFMA). This assumes all 32 threads are active in the dispatched warp.

4 ops/cycle/SM * 4 SMs * 1800 MHz * 2 ops/DFMA = 56 GFLOPS double

The calculation assumes all threads in a warp are active.

The code in your question contains two dependent operations that could be fused into a DFMA. Use cuobjdump -sass to examine the assembly. If you launch multiple warps on the same SM the test turns into a measure of dependent instruction throughput not latency.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题