How does CUDA assign device IDs to GPUs?

*爱你&永不变心* 提交于 2019-11-27 08:36:10

CUDA picks the fastest device as device 0. So when you swap GPUs in and out the ordering might change completely. It might be better to pick GPUs based on their PCI bus id using:

cudaError_t cudaDeviceGetByPCIBusId ( int* device, char* pciBusId )
   Returns a handle to a compute device.

cudaError_t cudaDeviceGetPCIBusId ( char* pciBusId, int  len, int  device )
   Returns a PCI Bus Id string for the device.

or CUDA Driver API cuDeviceGetByPCIBusId cuDeviceGetPCIBusId.

But IMO the most reliable way to know which device is which would be to use NVML or nvidia-smi to get each device's unique identifier (UUID) using nvmlDeviceGetUUID and then match it do CUDA device with pciBusId using nvmlDeviceGetPciInfo.

Set the environment variable CUDA_DEVICE_ORDER as:

export CUDA_DEVICE_ORDER=PCI_BUS_ID

Then the GPU IDs will be ordered by pci bus IDs.

The CUDA Support/Choosing a GPU suggest that

when running a CUDA program on a machine with multiple GPUs, by default CUDA kernels will execute on whichever GPU is installed in the primary graphics card slot.

Also, the discussion at No GPU selected, code working properly, how's this possible? suggests that CUDA does not map the "best" card to device 0 in general.

EDIT

Today I have installed a PC with a Tesla C2050 card for computation and a 8084 GS card for visualization switching their position between the first two PCI-E slots. I have used deviceQuery and noticed that GPU 0 is always that in the first PCI slot and GPU 1 is always that in the second PCI slot. I do not know if this is a general statement, but it is a proof that for my system GPUs are numbered not according to their "power", but according to their positions.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!