Fastest way to access device vector elements directly on host

瘦欲@ 提交于 2019-12-07 08:42:48

问题


I refer you to following page http://code.google.com/p/thrust/wiki/QuickStartGuide#Vectors. Please see second paragraph where it says that

Also note that individual elements of a device_vector can be accessed using the standard bracket notation. However, because each of these accesses requires a call to cudaMemcpy, they should be used sparingly. We'll look at some more efficient techniques later.

I searched all over the document but I could not find the more efficient technique. Does anyone know the fastest way to do this? i.e how to access device vector/device pointer on host fastest?


回答1:


The "more efficient techniques" the guide alludes to are the Thrust algorithms. It's more efficient to access (or copy across the PCI-E bus) millions of elements at once than it is to access a single element because the fixed cost of CPU/GPU communication is amortized.

There's no faster way to copy data from the GPU to the CPU than by calling cudaMemcpy, because it is the most primitive way for a CUDA programmer to implement the task.




回答2:


If you have a device_vector which you need to do more processing on, try to keep the data on the device and process it with Thrust algorithms or your own kernels. If you need to read only a few values from the device_vector, just access the values directly with bracket notation. If you need to access more than a few values, copy the device_vector over to a host_vector and read the the values from there.

thrust::device_vector<int> D;
...
thrust::host_vector<int> H = D;


来源:https://stackoverflow.com/questions/8660333/fastest-way-to-access-device-vector-elements-directly-on-host

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!