CUDA: allocation of an array of structs inside a struct

筅森魡賤 提交于 2019-11-29 05:08:25

The problem is here:

cudaMalloc((void**)&nL,sizeof(NLayer));
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron));

In first line, nL is pointing to structure in global memory on device. Therefore, in second line the first argument to cudaMalloc is address residing on GPU, which is undefined behaviour (on my test system, it causes segfault; in your case, though, there is something more subtle).

The correct way to do what you want is first to create structure in host memory, fill it with data, and then copy it to device, like this:

NLayer* nL;
NLayer h_nL;
int i;
int tmp=9;
// Allocate data on device
cudaMalloc((void**)&nL, sizeof(NLayer));
cudaMalloc((void**)&h_nL.neurons, 6*sizeof(Neuron));
// Copy nlayer with pointers to device
cudaMemcpy(nL, &h_nL, sizeof(NLayer), cudaMemcpyHostToDevice);

Also, don't forget to always check for any errors from CUDA routines.

UPDATE

In second version of your code:

cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,...) --- again, you are dereferencing device pointer (d_layer) on host. Instead, you should use

cudaMemcpy(&h_layer.neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice

Here you take h_layer (host structure), read its element (h_layer.neurons), which is pointer to device memory. Then you do some pointer arithmetics on it (&h_layer.neurons[i].weights). No access to device memory is needed to compute this address.

It all depends on the GPU card your using. The Fermi card uses uniform addressing of shared and global memory space, while pre-Fermi cards don't.

For the pre-Fermi case, you don't know if the address should be shared or global. The compiler can usually figure this out, but there are cases where it can't. When a pointer to shared memory is required, you usually take an address of a shared variable and the compiler can recognise this. The message "assuming global" will appear when this is not explicitly defined.

If you are using a GPU that has compute capabiilty of 2.x or higher, it should work with the -arch=sm_20 compiler flag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!