Compiling for Compute Capability 2.x in CUDA C for VS2010

问题

I was following this: Dynamically allocating memory inside __device/global__ CUDA kernel

But it still doesn't compile.

error : calling a host function("_malloc_dbg") from a __device__/__global__  
function("kernel") is not allowed

error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA  
\v4.1\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\"  
--use-local-env --cl-version 2010 -ccbin "c:\Program Files (x86)\Microsoft Visual  
Studio 10.0\VC\bin\x86_amd64" -I"..\..\..\Source\Include" -G0  --keep-dir   
"x64\Debug" -maxrregcount=0  --machine 64 --compile  -g  -Xcompiler "/EHsc /nologo 
/Od /Zi  /MDd " -o "x64\Debug\move.cu.obj"  "C:\Source\scene\move.cu"" exited with  
code 2. C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA  
4.1.targets     361 10

As suggested, I added #if __CUDA_ARCH__ >= 200 and it returns false.

What else can be the issues? I'm running on a GTX480.

Edit: I have this warning as well: #warning C4005: '_malloca' : macro redefinition

回答1:

I understand you solved your main problem but there is the remaining question:

I added #if __CUDA_ARCH__ >= 200 and it returns false.

The CUDA code is compiled at least twice. In one compilation pass the CPU code is generated, in another pass, the device code. __CUDA_ARCH__ is defined only for the device code generation. It is possible to make even more compilation passes and produce GPU code for several architectures. The code for CPU would not change, but the GPU will.

I suspect that you are testing the #if __CUDA_ARCH__ >= 200 when producing CPU code.

来源：https://stackoverflow.com/questions/9056183/compiling-for-compute-capability-2-x-in-cuda-c-for-vs2010

标签

cuda

gpu