I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions right after each other and it is difficult to write a cud
You can load cubin or fatbin at runtime using cuModuleLoad* functions in CUDA: Here's the API
You can use it to include PTX into your build, though the method is somewhat convoluted. For instance, suricata compiles its .cu files into PTX files for different architectures and then converts them into an .h file that contains PTX code as a 'C' array, and then just includes it from one of the files during the build.