Is it possible to put assembly instructions into CUDA code?

谁都会走 提交于 2019-12-05 05:16:29
Matias Valdenegro

No, you can't, there is nothing like the asm constructs from C/C++. What you can do is tweak the generated PTX assembly and then use it with CUDA.

See this for an example.

But for GPUs, assembly optimizations are NOT necessary, you should do other optimizations first, such as memory coalescency and occupancy. See the CUDA Best Practices guide for more information.

Since CUDA 4.0, inline PTX is supported by the CUDA toolchain. There is a document in the toolkit that describes it: Using_Inline_PTX_Assembly_In_CUDA.pdf

Below is some code demonstrating use of inline PTX in CUDA 4.0. Note that this code should not be used as a replacement for CUDA's built-in __clz() function, I merely wrote it to explore aspects of the new inline PTX capability.

__device__ __forceinline__ int my_clz (unsigned int x)
{
    int res;

    asm ("{\n"
         "        .reg .pred iszero, gezero;\n"
         "        .reg .u32 t1, t2;\n"
         "        mov.b32         t1, %1;\n"
         "        shr.u32         %0, t1, 16;\n"
         "        setp.eq.b32     iszero, %0, 0;\n"
         "        mov.b32         %0, 0;\n"
         "@iszero shl.b32         t1, t1, 16;\n"
         "@iszero or.b32          %0, %0, 16;\n"
         "        and.b32         t2, t1, 0xff000000;\n"
         "        setp.eq.b32     iszero, t2, 0;\n"
         "@iszero shl.b32         t1, t1, 8;\n"
         "@iszero or.b32          %0, %0, 8;\n"
         "        and.b32         t2, t1, 0xf0000000;\n"
         "        setp.eq.b32     iszero, t2, 0;\n"
         "@iszero shl.b32         t1, t1, 4;\n"
         "@iszero or.b32          %0, %0, 4;\n"
         "        and.b32         t2, t1, 0xc0000000;\n"
         "        setp.eq.b32     iszero, t2, 0;\n"
         "@iszero shl.b32         t1, t1, 2;\n"
         "@iszero or.b32          %0, %0, 2;\n"
         "        setp.ge.s32     gezero, t1, 0;\n"
         "        setp.eq.b32     iszero, t1, 0;\n"
         "@gezero or.b32          %0, %0, 1;\n"
         "@iszero add.u32         %0, %0, 1;\n\t"
         "}"
         : "=r"(res)
         : "r"(x));
    return res;
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!