问题
Why should I use the CUDA Driver API, and in which cases I can't use CUDA Runtime API (which is more convenient than Driver API)?
回答1:
The runtime API is an higher level of abstraction over the driver API and it's usually easier to use (the performance gap should be minimal). The driver API is a handle-based one and provides a higher degree of control. The runtime API, on the contrary, is easier to use (e.g. you can use the kernel<<<>>>
launch syntax).
That "higher degree of control" means that with the driver API you have to deal with module initialization and memory management in a more verbose way, but that allows you to do more stuff, e.g. disable the driver JIT optimizations for the kernel code:
CU_JIT_OPTIMIZATION_LEVEL - Level of optimizations to apply to generated code (0 - 4), with 4 being the default and highest level of optimizations. Option type: unsigned int
From http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDA__TYPES_gfaa9995214a4f3341f48c5830cea0d8a.html
This isn't currently possible via code with the runtime API. Finer degree of control means that you might render things broken or slower, don't use it if you don't know what they are.
You should usually only use either the runtime API or the driver API in your application although, with newer CUDA versions, runtime API code can peacefully coexist with driver API code (http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf)
An application can mix runtime API code with driver API code.
回答2:
To add to and expand on an excellent answer by @Marco. One major function that driver API makes available is loading kernels at runtime. This is covered by module portion of driver API, and here is the overview:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#module
With runtime API, all the kernels are automatically loaded during the initialization, and stay loaded for as long as the program runs. With driver API, programmer has explicit control over loading and unloading kernels. The latter can be used, for instance, to download updated kernel versions from the Internet. Another use is keeping only the currently relevant modules loaded, even though this is rarely a concern given the typically small size of kernels relative to the rest of the program.
[Update: deleted irrelevant stuff]
来源:https://stackoverflow.com/questions/27014480/why-should-i-use-the-cuda-driver-api-instead-of-cuda-runtime-api