Compiling code containing dynamic parallelism fails

安稳与你 提交于 2019-11-26 17:07:26

问题


I am doing dynamic parallelism programming using CUDA 5.5 and an NVDIA GeForce GTX 780 whose compute capability is 3.5. I am calling a kernel function inside a kernel function but it is giving me an error:

error : calling a __global__ function("kernel_6") from a __global__ function("kernel_5") is only allowed on the compute_35 architecture or above

What am I doing wrong?


回答1:


You can do something like this

nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt

or

If you have 2 files simple1.cu and test.c then you can do something as below. This is called seperate compilation.

nvcc -arch=sm_35 -dc simple1.cu 
nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt
g++ -c test.c 
g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ -lcudart

The same is explained in the cuda programming guide




回答2:


From Visual Studio 2010:

1) View -> Property Pages
2) Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true)
3) Configuration Properties -> CUDA C/C++ -> Device -> Code Generation -> compute_35,sm_35
4) Configuration Properties -> Linker -> Input -> Additional Dependencies -> cudadevrt.lib



回答3:


You need to let nvcc generate CC 3.5 code for your device. This can be done by adding this option to nvcc command line.

 -gencode arch=compute_35,code=sm_35

You may find the CUDA samples on dynamic parallelism for more detail. They contain both command line options and project settings for all supported OS.

http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-quicksort--cuda-dynamic-parallelism-



来源:https://stackoverflow.com/questions/19287461/compiling-code-containing-dynamic-parallelism-fails

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!