Dynamically allocating memory inside __device/global__ CUDA kernel

问题

According to the CUDA Programming Guide , Page 122, it is possible to dynamically allocate memory inside a device/global function so long as we're using compute architecture 2.x.

My problem is that when I attempt this I get the command line message:

The command "some command" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" etc...

This is followed by an error saying that you cannot call a host function (malloc) from a device/global function.

The above message is showing that it is attempting to compile under compute 1.x. I am using VS2010 and have "Code Generation" set to "compute_20,sm_20" in the "CUDA C/C++" property page, so I am not sure why it is still trying to compile under compute 1.x. I am definitely using a card that supports 2.x. Any ideas?

回答1:

You should be able to see the nvcc command line in the output. In fact, I think that bit you pasted with all the -gencode/etc. in it is your command line. Therefore, it is also proof that you are compiling the code for both sm_10 and sm_20, which is why you get the error when you call malloc.

You can confirm by wrapping the calls to malloc with #if __CUDA_ARCH__ >= 200 and see if the error goes away.

I'm guessing that you set the properties to compile for sm_20 in the default properties for .cu files in your project, but after you added the .cu file to the project. When the file was added to the project, the defaults were probably set to sm_10 and sm_20 (which is the default for the .rules file). If you right-click on the file itself you might see that sm_20 is checked. Just a hunch.

来源：https://stackoverflow.com/questions/6937693/dynamically-allocating-memory-inside-device-global-cuda-kernel

标签

visual-studio-2010

cuda

parallel-processing