A Method of counting Floating Point Operations in a C++/CUDA Program using PTX

ぐ巨炮叔叔 提交于 2019-12-25 02:57:09

问题


I have a somewhat large CUDA application and I need to calculate the attained GFLOPs. I'm looking for an easy and perhaps generic way of counting the number of floating point operations.

Is it possible to count floating point operations from the generated PTX code (as shown below), using a list of predefined fpo in assembly language? Based on the code, can the counting be made generic? For example, does add.s32 %r58, %r8, -2; count as one floating point operation?

EXAMPLE:

BB3_2:
.loc 2 108 1
mov.u32         %r8, %r79;
setp.ge.s32     %p1, %r78, %r16;
setp.lt.s32     %p2, %r78, 0;
or.pred         %p3, %p2, %p1;
@%p3 bra        BB3_5;

add.s32         %r58, %r8, -2;
setp.lt.s32     %p4, %r58, 0;
setp.ge.s32     %p5, %r58, %r15;
or.pred         %p6, %p4, %p5;
@%p6 bra        BB3_5;

.loc 2 112 1
ld.global.u8    %rc1, [%rd17];
cvt.rn.f32.u8   %f11, %rc1;
mul.wide.u32    %rd12, %r80, 4;
add.s64         %rd13, %rd7, %rd12;
ld.local.f32    %f12, [%rd13];
fma.rn.f32      %f14, %f11, %f12, %f14;
.loc 2 113 1
add.f32         %f15, %f15, %f12;

Or are there far simpler ways of counting FPOs and this is a waste of time?


回答1:


The easiest way to count FLOPS would be to have the CUDA profiler do it for you. By selecting the Achieved FLOPS experiment, you can get charts like this:

The Floating Point Operations chart displays a count of each type of floating point operation executed by your kernel.



来源:https://stackoverflow.com/questions/14812446/a-method-of-counting-floating-point-operations-in-a-c-cuda-program-using-ptx

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!