CUDA Zero Copy memory considerations

后端 未结 5 1689
清歌不尽
清歌不尽 2021-01-04 13:18

I am trying to figure out if using cudaHostAlloc (or cudaMallocHost?) is appropriate.

I am trying to run a kernel where my input data is more than the amount availab

5条回答
  •  感动是毒
    2021-01-04 14:09

    Neither the CUDA C Programming Guide, nor the CUDA Best Practices Guide mention that the amount allocated by cudaMallocHost can 't be bigger than the device memory so I conclude it's possible.

    Data transfers from page locked memory to the device are faster than normal data transfers and even faster if using write-combined memory. Also, the memory allocated this way can be mapped into device memory space eliminating the need to (manually) copy the data at all. It happens automatic as the data is needed so you should be able to process more data than fits into device memory.

    However, system performance (of the host) can greatly suffer, if the page-locked amount makes up a significant part of the host memory.

    So when to use this technique?, simple: If the data needs be read only once and written only once, use it. It will yield a performance gain, since one would've to copy data back and forth at some point anyway. But as soon as the need to store intermediate results, that don't fit into registers or shared memory, arises, process chunks of your data that fit into device memory with cudaMalloc.

提交回复
热议问题