I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions right after each other and it is difficult to write a cud
This sequence of nvcc commands seems to do the trick. Please see here for more details.
Create your ptx files to modify
nvcc file1.cu file2.cu file3.cu -rdc=true --ptx
Link ptx files into an object file
nvcc file1.ptx file2.ptx file3.ptx -dlink
I did this on Windows so it popped out a_dlink.obj
. As the documentation points out host code has been discarded by this point. Run
nvcc file1.cu file2.cu file3.cu -rdc=true --compile
to create object files. They will be .obj
for Windows or .o
for Linux. Then create a library output file
nvcc file1.obj file2.obj file3.obj a_dlink.obj --lib -o myprogram.lib
Then run
nvcc myprogram.lib
which will pop out an exectuable a.exe
on Windows or a.out
on Linux. This procedure works for cubin
and fatbin
files too. Just substitute those names in place of ptx
.