Is there an equivalent to memcpy() that works inside a CUDA kernel?

前端未结

关注

 3  1991

一个人的身影 2020-12-23 22:05

I\'m trying to break apart and reshape the structure of an array asynchronously using the CUDA kernel. memcpy() doesn\'t work inside the kernel, and neither doe

3条回答

感动是毒 (楼主)

2020-12-23 22:27

cudaMemcpy() does indeed run asynchronously but you're right, it can't be executed from within a kernel.

Is the new shape of the array determined based on some calculation? Then, you would typically run the same number of threads as there are entries in your array. Each thread would run a calculation to determine the source and destination of a single entry in the array and then copy it there with a single assignment. (dst[i] = src[j]). If the new shape of the array is not based on calculations, it might be more efficient to run a series of cudaMemcpy() with cudaMemCpyDeviceToDevice from the host.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...