Can anyone describe the differences between __global__
and __device__
?
When should I use __device__
, and when to use __glob
I am recording some unfounded speculations here for the time being (I will substantiate these later when I come across some authoritative source)...
__device__
functions can have a return type other than void but __global__
functions must always return void.
__global__
functions can be called from within other kernels running on the GPU to launch additional GPU threads (as part of CUDA dynamic parallelism model (aka CNP)) while __device__
functions run on the same thread as the calling kernel.