Does 'code=sm_X' embed only binary (cubin) code, or also PTX code, or both?

℡╲_俬逩灬. 提交于 2019-11-29 08:22:11

nvcc has many formats by which the code generation options can be specified. A read of section 6 of the nvcc manual may be instructive.

when using this format:

nvcc -gencode arch=compute_13,code=sm_13 ...

only the SASS code for a sm_13 (cc 1.3) device will be retained. There will be no PTX retained in the executable object, and so the code can only run on a device capable of running cc1.3 SASS.

Using the above command format, in order to embed a PTX version of the source code into the executable object, it's necessary to use a virtual architecture specification for the option provided to code=.... Since this particular format (using -gencode) does not allow specification of multiple targets in a single switch, we must pass the -gencode switch multiple times to nvcc, one for each target we desire to be embedded in the executable object.

So extending the above example, we could use the following:

nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_13,code=compute_13 ...

This would embed both cc1.3 SASS (by the first gencode switch) and cc1.3 PTX (by the second gencode switch) in the executable. Devices capable of running cc1.3 SASS code directly will use that. Other devices (of compute capability greater than cc 1.3) will do a JIT-compile step by the driver, to convert the cc1.3 PTX code to a SASS code with an architecture suitable for the device in question.

I agree that the GTC 2013 presentation (e.g. slide 37) seems to suggest that

nvcc -gencode arch=compute_13,code=sm_13 ...

is sufficient for all devices of compute capability 1.3 or higher. It is not, and this is easy to demonstrate. If you compile a code using the above format, and attempt to run it on a cc 2.0 device, it will fail with an "invalid device function" error associated with any kernel or kernels you have in your code.

Again, nvcc has a variety of command formats and "shortcuts" for specifying code generation. Some relatively simple ones, such as:

nvcc -arch=sm_13 ...

will embed both a PTX and SASS version of the code in the executable object, resulting in the kind of forward-compatibility suggested.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!