CUDA C programming with 2 video cards

懵懂的女人 提交于 2019-12-04 05:03:01

There is no "automatic" way to run a CUDA kernel on multiple GPUs.

You will need to devise a way to decompose the matrix multiplication problem into independent operations that can be run in parallel (so one on each GPU in parallel). As a simple example:

C = A.B is equivalent to C = [A].[B1|B2] = [A.B1|A.B2] where B1 and B2 are suitably sized matrices containing the columns of the matrix B and | denotes columnwise concantenation. You can calculate A.B1 and A.B2 as separate matrix multiplication operations, and then perform the concatenation when copying the resulting submatrices back to host memory.

Once you have a suitable decomposition scheme, you then implement it using the standard multi-gpu facilities in the CUDA 4.x API. For a great overview of multi-GPU programming using the CUDA APIs, I recommend watching Paulius Micikevicius' excellent talk from GTC 2012, which available as a streaming video and PDF here.

The basics are described in the CUDA C Programming Guide under section 3.2.6.

Basically, you can set on which GPU your current host thread operates on by calling cudaSetDevice(). Still you have to write your own code, to decompose your routines to be split across multiple GPUs.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!