问题
I was following this: Dynamically allocating memory inside __device/global__ CUDA kernel
But it still doesn't compile.
error : calling a host function("_malloc_dbg") from a __device__/__global__
function("kernel") is not allowed
error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
\v4.1\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\"
--use-local-env --cl-version 2010 -ccbin "c:\Program Files (x86)\Microsoft Visual
Studio 10.0\VC\bin\x86_amd64" -I"..\..\..\Source\Include" -G0 --keep-dir
"x64\Debug" -maxrregcount=0 --machine 64 --compile -g -Xcompiler "/EHsc /nologo
/Od /Zi /MDd " -o "x64\Debug\move.cu.obj" "C:\Source\scene\move.cu"" exited with
code 2. C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA
4.1.targets 361 10
As suggested, I added #if __CUDA_ARCH__ >= 200
and it returns false.
What else can be the issues? I'm running on a GTX480.
Edit: I have this warning as well: #warning C4005: '_malloca' : macro redefinition
回答1:
I understand you solved your main problem but there is the remaining question:
I added
#if __CUDA_ARCH__ >= 200
and it returns false.
The CUDA code is compiled at least twice. In one compilation pass the CPU code is generated, in another pass, the device code. __CUDA_ARCH__
is defined only for the device code generation.
It is possible to make even more compilation passes and produce GPU code for several architectures. The code for CPU would not change, but the GPU will.
I suspect that you are testing the #if __CUDA_ARCH__ >= 200
when producing CPU code.
来源:https://stackoverflow.com/questions/9056183/compiling-for-compute-capability-2-x-in-cuda-c-for-vs2010