Copying a multi-branch tree to GPU memory

守給你的承諾、 提交于 2019-12-11 10:06:43

问题


i have a tree of nodes and i am trying to copy it to GPU memory. the Node looks like this:

struct Node
{
   char *Key;
   int ChildCount;
   Node *Children;
}

And my copy function looks like this:

void CopyTreeToDevice(Node* node_s, Node* node_d)
{


     //allocate node on device and copy host node
     cudaMalloc( (void**)&node_d, sizeof(Node));
     cudaMemcpy(node_d, node_s, sizeof(Node), cudaMemcpyHostToDevice);

     //test
     printf("ChildCount of node_s looks to be : %d\n", node_s->ChildCount);
     printf("Key of node_s looks to be : %s\n", node_s->Key);

     Node *temp;
     temp =(Node *) malloc(sizeof(Node));
     cudaMemcpy(temp, node_d, sizeof(Node), cudaMemcpyDeviceToHost);
     printf("ChildCount of node_d on device is actually : %d\n", temp->ChildCount);
     printf("Key of node_d on device is actually : %s\n", temp->Key);
     free(temp);



     //       continue with child nodes
     if(node_s->ChildCount > 0)
     {
         //problem here
         cudaMalloc( (void**)&(node_d->Children), sizeof(Node)*(node_s->ChildCount));

         cudaMemcpy(node_d->Children, node_s->Children, 
                    sizeof(Node)*node_s->ChildCount, cudaMemcpyHostToDevice);

         for(int i=0;i<node_s->ChildCount;i++)
         {
                 CopyTreeToDevice(&(node_s->Children[i]), &(node_d->Children[i]));
         }
     }

}

But i have a problem with the line :

cudaMalloc( (void**)&(node_d->Children), sizeof(Node)*(node_s->ChildCount));

Gives me access violation exception.Test section works smoothly.no problem at initializing fields.

Here is the output of test section :

ChildCount of node_s looks to be : 35
Key of node_s looks to be : root
ChildCount of node_d on device is actually : 35
Key of node_d on device is actually : root

What is the reason for this?

Thanks.


回答1:


node_d->Children is a variable which resides in device code. You cannot use it directly by your host code, as you do with your second cudaMalloc. Morover, copying host-pointers to device makes not much sense as you cannot dereference them in the device code.

A nicer and much quicker approach would be to:

  • Preallocate a big array for your whole tree.
  • Use an array index instead of pointers. The validity of indices will be preserved upon transfers to and from device.
  • Allocate the whole array once on the device. Having multiple memAlloc may be inefficient (especially in Windows systems, when monitor is connected to that GPU). Also, since memAlloc returns an address which is always aligned to 512 bytes, you practically cannot allocate smaller chunks of memory. So, according to your current code, every children array will consume at least 512 bytes, even if there are only 2 children inside.
  • Copy the whole array once from host to device. This is much faster, than having multiple memCopy instructions, even if you actually copy some extra region of memory which is unused.



回答2:


Looks like node_d itself is on the gpu. You can not access structures on the gpu using -> or . You need to copy back node_d to the host, allocate the necessary data and copy it back.



来源:https://stackoverflow.com/questions/6336992/copying-a-multi-branch-tree-to-gpu-memory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!