Well, I have quite a delicate question :)
Let\'s start with what I have:
In his comment, Roger Dahl has linked the following post
Passing the PTX program to the CUDA driver directly
in which the use of two functions, namely cuModuleLoad and cuModuleLoadDataEx, are addressed. The former is used to load PTX code from file and passing it to the nvcc compiler driver. The latter avoids I/O and enables to pass the PTX code to the driver as a C string. In either cases, you need to have already at your disposal the PTX code, either as the result of the compilation of a CUDA kernel (to be loaded or copied and pasted in the C string) or as an hand-written source.
But what happens if you have to create the PTX code on-the-fly starting from a CUDA kernel? Following the approach in CUDA Expression templates, you can define a string containing your CUDA kernel like
ss << "extern \"C\" __global__ void kernel( ";
ss << def_line.str() << ", unsigned int vector_size, unsigned int number_of_used_threads ) { \n";
ss << "\tint idx = blockDim.x * blockIdx.x + threadIdx.x; \n";
ss << "\tfor(unsigned int i = 0; i < ";
ss << "(vector_size + number_of_used_threads - 1) / number_of_used_threads; ++i) {\n";
ss << "\t\tif(idx < vector_size) { \n";
ss << "\t\t\t" << eval_line.str() << "\n";
ss << "\t\t\tidx += number_of_used_threads;\n";
ss << "\t\t}\n";
ss << "\t}\n";
ss << "}\n\n\n\n";
then using system calls to compile it as
int nvcc_exit_status = system(
(std::string(NVCC) + " -ptx " + NVCC_FLAGS + " " + kernel_filename
+ " -o " + kernel_comp_filename).c_str()
);
if (nvcc_exit_status) {
std::cerr << "ERROR: nvcc exits with status code: " << nvcc_exit_status << std::endl;
exit(1);
}
and finally use cuModuleLoad and cuModuleGetFunction to load the PTX code from file and passing it to the compiler driver like
result = cuModuleLoad(&cuModule, kernel_comp_filename.c_str());
assert(result == CUDA_SUCCESS);
result = cuModuleGetFunction(&cuFunction, cuModule, "kernel");
assert(result == CUDA_SUCCESS);
Of course, expression templates have nothing to do with this problem and I'm only quoting the source of the ideas I'm reporting in this answer.