How to prevent FTZ for a single line in CUDA

江枫思渺然 提交于 2019-12-12 13:33:26

问题


I am working on a particle code where flushing-to-zero is extensively used to extract performance. However there is a single floating point comparison statement that I do not wish to be flushed. One solution is to use inline PTX, but it introduces unnecessary instructions since there is no boolean type, but just predicate registers, in PTX: C++ code:

float a, b;
if ( a < b ) do_something;
// compiles into SASS:
//     FSETP.LT.FTZ.AND P0, PT, A, B, PT;
// @P0 DO_SOMETHING 

PTX:

float a, b;
uint p;
asm("{.reg .pred p; setp.lt.f32 p, %1, %2; selp %0, 1, 0, p;}" : "=r"(p) : "f"(a), "f"(b) );
if (p) do_something;
// compiled into SASS:
//     FSETP.LT.AND P0, PT, A, B, PT;
//     SEL R2, RZ, 0x1, !P0;
//     ISETP.NE.AND P0, PT, R2, RZ, PT;
// @P0 DO_SOMETHING 

Is there a way that I can do the non-FTZ comparison with a single instruction without coding the entire thing in PTX/SASS?

来源:https://stackoverflow.com/questions/29563307/how-to-prevent-ftz-for-a-single-line-in-cuda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!