Fastest way to access device vector elements directly on host

北慕城南 提交于 2019-12-05 15:19:45

The "more efficient techniques" the guide alludes to are the Thrust algorithms. It's more efficient to access (or copy across the PCI-E bus) millions of elements at once than it is to access a single element because the fixed cost of CPU/GPU communication is amortized.

There's no faster way to copy data from the GPU to the CPU than by calling cudaMemcpy, because it is the most primitive way for a CUDA programmer to implement the task.

If you have a device_vector which you need to do more processing on, try to keep the data on the device and process it with Thrust algorithms or your own kernels. If you need to read only a few values from the device_vector, just access the values directly with bracket notation. If you need to access more than a few values, copy the device_vector over to a host_vector and read the the values from there.

thrust::device_vector<int> D;
...
thrust::host_vector<int> H = D;
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!