Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName()

匿名 (未验证) 提交于 2019-12-03 01:18:02

问题:

I'm running this command into a shell and get:

C:\Users\me>nvidia-smi -L     GPU 0: Quadro K2000 (UUID: GPU-b1ac50d1-019c-58e1-3598-4877fddd3f17)     GPU 1: Quadro 2000 (UUID: GPU-1f22a253-c329-dfb7-0db4-e005efb6a4c7) 

But in my code, when I run cuDeviceGetName(.., ID) where ID is the ID given by the nvidia-smi output, the devices have been inverted: GPU 0 becomes Quadro 2000 and GPU 1 becomes Quadro K2000.

Is this an expected behavior or a bug ? Does anyone know a workaround to make nvidia-smi get the 'real' ID of GPUs ? I could use the UUID to get the proper device with nvmlDeviceGetUUID() but using nvml API seems a bit too complicated for what I'm trying to achieve.

This question discuss how CUDA assign IDs to devices without clear conclusion.

I am using CUDA 6.5.

EDIT: I've had a look at nvidia-smi manpage (should have done that earlier...). It states:

"It is recommended that users desiring consistencyuse either UUDI or PCI bus ID, since device enumeration ordering is not guaranteed to be consistent"

Still looking for a kludge...

回答1:

It's expected behavior.

nvidia-smi enumerates in PCI order.

By default, the CUDA driver and runtime APIs do not.

The question you linked clearly shows how to associate the two numbering/ordering schemes.

There is no way to cause nvidia-smi to modify its ordering scheme to match whatever will be generated by the CUDA runtime or driver APIs. However you can modify the CUDA runtime enumeration order through the use of an environment variable in CUDA 8.



回答2:

You can set the device order for CUDA environment in your shell to follow the bus ID instead of the default of fastest card. Requires CUDA 7 and up.

export CUDA_DEVICE_ORDER=PCI_BUS_ID



回答3:

It's the expected behaviour.

nvidia-smi manpage says that

the GPU/Unit's 0-based index in the natural enumeration returned by the driver,

CUDA API enumerates in descending order of compute capability according to "Programming Guide" 3.2.6.1 Device enumeration.

I had this problem and I have written a program is analog of nvidia-smi, but with enumerated devices in an order consistent with CUDA API. Farther in the text ref on the program

https://github.com/smilart/nvidia-cdl

I have written the program because nvidia-smi cannot enumerated device in an order consistent with CUDA API.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!