Is there an equivalent to memcpy() that works inside a CUDA kernel?

前端 未结 3 1991
一个人的身影
一个人的身影 2020-12-23 22:05

I\'m trying to break apart and reshape the structure of an array asynchronously using the CUDA kernel. memcpy() doesn\'t work inside the kernel, and neither doe

3条回答
  •  感动是毒
    2020-12-23 22:27

    cudaMemcpy() does indeed run asynchronously but you're right, it can't be executed from within a kernel.

    Is the new shape of the array determined based on some calculation? Then, you would typically run the same number of threads as there are entries in your array. Each thread would run a calculation to determine the source and destination of a single entry in the array and then copy it there with a single assignment. (dst[i] = src[j]). If the new shape of the array is not based on calculations, it might be more efficient to run a series of cudaMemcpy() with cudaMemCpyDeviceToDevice from the host.

提交回复
热议问题