If you are using Visual Studio Pro on Windows I sugest you run a test application using NVidia's Parallel NSight, I think it can tell you the time stamps from the method call to the real execution, in any case a penalty is inherent, but it will be negligible if your kernels lasts long enought.