发表新帖

发表新帖

How to evaluate CUDA performance?

前端未结

关注

 2  1268

自闭症患者 2021-01-05 20:41

I programmed CUDA kernel my own. Compare to CPU code, my kernel code is 10 times faster than CPUs.

But I have question with my experiments.

Does my program

2条回答

慢半拍i (楼主)

2021-01-05 20:47

The preferred measure of performance is elapsed time. GFLOPs can be used as a comparison method but it is often difficult to compare between compilers and architectures due to differences in instruction set, compiler code generation, and method of counting FLOPs.

The best method is to time the performance of the application. For the CUDA code you should time all code that will occur per launch. This includes memory copies and synchronization.

Nsight Visual Studio Edition and the Visual Profiler provide the most accurate measurement of each operation. Nsight Visual Studio Edition provides theoretical bandwidth and FLOPs values for each device. In addition the Achieved FLOPs experiment can be used to capture the FLOP count both for single and double precision.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题