Dynamically allocating memory inside __device/global__ CUDA kernel

元气小坏坏 提交于 2020-01-04 02:44:32

问题


According to the CUDA Programming Guide , Page 122, it is possible to dynamically allocate memory inside a device/global function so long as we're using compute architecture 2.x.

My problem is that when I attempt this I get the command line message:

The command "some command" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" etc...

This is followed by an error saying that you cannot call a host function (malloc) from a device/global function.

The above message is showing that it is attempting to compile under compute 1.x. I am using VS2010 and have "Code Generation" set to "compute_20,sm_20" in the "CUDA C/C++" property page, so I am not sure why it is still trying to compile under compute 1.x. I am definitely using a card that supports 2.x. Any ideas?


回答1:


You should be able to see the nvcc command line in the output. In fact, I think that bit you pasted with all the -gencode/etc. in it is your command line. Therefore, it is also proof that you are compiling the code for both sm_10 and sm_20, which is why you get the error when you call malloc.

You can confirm by wrapping the calls to malloc with #if __CUDA_ARCH__ >= 200 and see if the error goes away.

I'm guessing that you set the properties to compile for sm_20 in the default properties for .cu files in your project, but after you added the .cu file to the project. When the file was added to the project, the defaults were probably set to sm_10 and sm_20 (which is the default for the .rules file). If you right-click on the file itself you might see that sm_20 is checked. Just a hunch.



来源:https://stackoverflow.com/questions/6937693/dynamically-allocating-memory-inside-device-global-cuda-kernel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!