问题
i have a tree of nodes and i am trying to copy it to GPU memory. the Node looks like this:
struct Node
{
char *Key;
int ChildCount;
Node *Children;
}
And my copy function looks like this:
void CopyTreeToDevice(Node* node_s, Node* node_d)
{
//allocate node on device and copy host node
cudaMalloc( (void**)&node_d, sizeof(Node));
cudaMemcpy(node_d, node_s, sizeof(Node), cudaMemcpyHostToDevice);
//test
printf("ChildCount of node_s looks to be : %d\n", node_s->ChildCount);
printf("Key of node_s looks to be : %s\n", node_s->Key);
Node *temp;
temp =(Node *) malloc(sizeof(Node));
cudaMemcpy(temp, node_d, sizeof(Node), cudaMemcpyDeviceToHost);
printf("ChildCount of node_d on device is actually : %d\n", temp->ChildCount);
printf("Key of node_d on device is actually : %s\n", temp->Key);
free(temp);
// continue with child nodes
if(node_s->ChildCount > 0)
{
//problem here
cudaMalloc( (void**)&(node_d->Children), sizeof(Node)*(node_s->ChildCount));
cudaMemcpy(node_d->Children, node_s->Children,
sizeof(Node)*node_s->ChildCount, cudaMemcpyHostToDevice);
for(int i=0;i<node_s->ChildCount;i++)
{
CopyTreeToDevice(&(node_s->Children[i]), &(node_d->Children[i]));
}
}
}
But i have a problem with the line :
cudaMalloc( (void**)&(node_d->Children), sizeof(Node)*(node_s->ChildCount));
Gives me access violation exception.Test section works smoothly.no problem at initializing fields.
Here is the output of test section :
ChildCount of node_s looks to be : 35
Key of node_s looks to be : root
ChildCount of node_d on device is actually : 35
Key of node_d on device is actually : root
What is the reason for this?
Thanks.
回答1:
node_d->Children
is a variable which resides in device code. You cannot use it directly by your host code, as you do with your second cudaMalloc
. Morover, copying host-pointers to device makes not much sense as you cannot dereference them in the device code.
A nicer and much quicker approach would be to:
- Preallocate a big array for your whole tree.
- Use an array index instead of pointers. The validity of indices will be preserved upon transfers to and from device.
- Allocate the whole array once on the device. Having multiple
memAlloc
may be inefficient (especially in Windows systems, when monitor is connected to that GPU). Also, sincememAlloc
returns an address which is always aligned to 512 bytes, you practically cannot allocate smaller chunks of memory. So, according to your current code, every children array will consume at least 512 bytes, even if there are only 2 children inside. - Copy the whole array once from host to device. This is much faster, than having multiple memCopy instructions, even if you actually copy some extra region of memory which is unused.
回答2:
Looks like node_d itself is on the gpu. You can not access structures on the gpu using -> or . You need to copy back node_d to the host, allocate the necessary data and copy it back.
来源:https://stackoverflow.com/questions/6336992/copying-a-multi-branch-tree-to-gpu-memory