When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced
a couple of important things to note:
first the differences between the APIs only apply to the host side code. The kernels are exactly the same. on the host side the complexity of the driver api is pretty trivial, the fundamental differences are:
in driver api you have access to functionality that is not available in the runtime api like contexts.
the emulator only works with code written for the runtime api.
oh and currently cudpp which is a very handy library only works with the runtime api.