Tensorflow only sees XLA_GPUs and cannot use them

一曲冷凌霜 提交于 2019-12-23 04:56:32

问题


I have a machine with 8 GPUS (4x GPU GTX 1080 Ti of 11 Gb de RAM and 4x RTX 1080) and cannot get tensorflow to use them correctly (or at all).

When I do

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

It prints

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5295519098812813462
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 12186007115805339517
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 17706271046686153881
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:2"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 14710290295129432533
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:3"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 1381213064943868400
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:4"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 12093982778662340719
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:5"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 682960671898108683
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:6"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 9901240111105546679
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:7"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 8442134369143872649
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 1687638086072792879
physical_device_desc: "device: XLA_CPU device"
].

If I try to use the GPUs for anything, nvidia-smi says they are occupied, but running at 0%, and the speed of the task shows tensorflow is just using the CPU. In other machines, with the same setup, it prints too '/device:GPU:2' along with '/device:XLA_GPU:2' (for instance), and tensorflow is able to use them with no problem.

I have already seen similar problems and solutions but none seems to work.


回答1:


That's most likely that you have incompatible CUDA installed. If you install tensorflow with pip, then check https://www.tensorflow.org/install/gpu to see tensorflow version and corresponding CUDA version (cudnn version too). Make sure you have correct version of tensorflow, CUDA and cudnn installed. Or, you could choose to build tensorflow from source, but I have less experience on it, you may Google it by yourself:) Good luck!



来源:https://stackoverflow.com/questions/57253395/tensorflow-only-sees-xla-gpus-and-cannot-use-them

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!