multi-gpu cuda: Run kernel on one device and modify elements on the other?

Suppose I have multiple GPU's in a machine and I have a kernel running on GPU0.

With the UVA and P2P features of CUDA 4.0, can I modify the contents of an array on another device say GPU1 when the kernel is running on GPU0?

The simpleP2P example in the CUDA 4.0 SDK does not demonstrate this.

It only demonstrates:

Peer-to-peer memcopies
A kernel running on GPU0 which reads input from GPU1 buffer and writes output to GPU0 buffer
A kernel running on GPU1 which reads input from GPU0 buffer and writes output to GPU1 buffer

Short answer: Yes, you can.

Longer answer

The linked presentation gives full details, but here are the requirements:

Must be on a 64-bit OS (either Linux or Windows with the Tesla Compute Cluster driver).
GPUs must both be Compute Capability 2.0 (sm_20) or higher.
Currently the GPUs must be attached to the same IOH.

You can use cudaDeviceCanAccessPeer() to query whether direct P2P access is possible.

来源：https://stackoverflow.com/questions/9232469/multi-gpu-cuda-run-kernel-on-one-device-and-modify-elements-on-the-other

标签

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!