I am doing dynamic parallelism programming using CUDA 5.5 and an NVDIA GeForce GTX 780 whose compute capability is 3.5. I am calling a kernel function inside a kernel functi
You need to let nvcc generate CC 3.5 code for your device. This can be done by adding this option to nvcc command line.
-gencode arch=compute_35,code=sm_35
You may find the CUDA samples on dynamic parallelism for more detail. They contain both command line options and project settings for all supported OS.
http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-quicksort--cuda-dynamic-parallelism-