Syntax on inline PTX code for CUDA

孤街浪徒 提交于 2019-12-11 13:45:59

问题


As written in Nvidia's Inline PTX Assembly document, the grammar for using inline assembly is: asm("temp_string" : "constraint"(output) : "constraint"(input));
Here are two examples:
asm("vadd.s32.s32.s32 %0, %1.h0, %2.h0;" : "=r"(v) : "r"(a), "r"(b));
asm("vadd.u32.u32.u32 %0.b0, %1, %2, %3;" : "=r"(v) : "r"(a), "r"(b), "r"(z));
In both examples, there are parameters such as:h0 or b0 follow the %n. I looked through CUDA's official document and didn't find anything concerns about the meaning of h0 or b0. I've seen h0,h1 and b0,b1,b2,b3. I guess h0 or h1 represents a 16bit value, while bn represents a byte value. Does someone know the exact meaning of these?

Thanks for the help from Roger Dahl. I read the PTX ISA 3.0 and found the answer.
"h" means half-word. h0 means the low half-word of a 32bit word. h1 means the high half-word of a 32bit word. "b" means an integer byte. b0,b1,b2 and b3 represent the first 8bit, second 8bit, third 8bit and highest 8bit of a 32bit word.


回答1:


vadd is one of the video specific instructions that are included with PTX. A description of the complete PTX ISA is included with the CUDA distribution. On my machine, it's in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\doc\ptx_isa_3.0.pdf. The description of the h0, h1, b0, etc, designators are in the 8.7.11 Video Instructions section. They represent different implicit shift/mask operations (see the optMerge function).



来源:https://stackoverflow.com/questions/11546221/syntax-on-inline-ptx-code-for-cuda

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!