发表新帖

发表新帖

CUDA Zero Copy memory considerations

后端未结

关注

 5  1689

清歌不尽 2021-01-04 13:18

I am trying to figure out if using cudaHostAlloc (or cudaMallocHost?) is appropriate.

I am trying to run a kernel where my input data is more than the amount availab

5条回答

感动是毒 (楼主)

2021-01-04 14:09

Neither the CUDA C Programming Guide, nor the CUDA Best Practices Guide mention that the amount allocated by cudaMallocHost can 't be bigger than the device memory so I conclude it's possible.

Data transfers from page locked memory to the device are faster than normal data transfers and even faster if using write-combined memory. Also, the memory allocated this way can be mapped into device memory space eliminating the need to (manually) copy the data at all. It happens automatic as the data is needed so you should be able to process more data than fits into device memory.

However, system performance (of the host) can greatly suffer, if the page-locked amount makes up a significant part of the host memory.

So when to use this technique?, simple: If the data needs be read only once and written only once, use it. It will yield a performance gain, since one would've to copy data back and forth at some point anyway. But as soon as the need to store intermediate results, that don't fit into registers or shared memory, arises, process chunks of your data that fit into device memory with cudaMalloc.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题