I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions right after each other and it is difficult to write a cud
Expanding on @ArtemB's answer:
nVIDIA offers a real-time compilation (RTC) library. There's an example of how it's used as part of the CUDA samples; you can access it here.
The sample actually starts from CUDA code, but the intermediary step is creating a PTX code as a plain C string (`char *). From there, this is what you do, basically:
char* ptx;
size_t ptxSize;
// ... populate ptx and ptxSize somehow ...
CUcontext context;
CUdevice cuDevice;
// These next few lines simply initialize your work with the CUDA driver,
// they're not specific to PTX compilation
cuInit(0);
cuDeviceGet(&cuDevice, 0); // or some other device on your system
cuCtxCreate(&context, 0, cuDevice);
// The magic happens here:
CUmodule module;
cuModuleLoadDataEx(&module, ptx, 0, 0, 0));
// And here is how you use your compiled PTX
CUfunction kernel_addr;
cuModuleGetFunction(&kernel_addr, module, "my_kernel_name");
cuLaunchKernel(kernel_addr,
// launch parameters go here
// kernel arguments go here
);
Notes:
libnvrtc.so.