Tensorflow 2.0rc not detecting GPUs

问题

TF2 is currently not detecting GPUs, I migrated from TF1.14 where using

tf.keras.utils.multi_gpu_model(model=model, gpus=2)

is now returning an error

ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0', '/xla_gpu:0', '/xla_gpu:1', '/xla_gpu:2', '/xla_gpu:3']. Try reducing `gpus`.

Running nvidia-smi returns the following information

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:09:00.0 Off |                    0 |
| N/A   46C    P0    62W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:0A:00.0 Off |                    0 |
| N/A   36C    P0    71W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:86:00.0 Off |                    0 |
| N/A   38C    P0    58W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:87:00.0 Off |                    0 |
| N/A   31C    P0    82W / 149W |      0MiB / 11441MiB |     73%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Also my TF version and is built for cuda

2.0.0-rc0

Please let me know what I am doing wrong so I can fix it.

回答1:

I would suggest you to-

Please first check your Cuda version. Make sure it is 10.0.
If that is 10.0, then check your TF version if it is for GPU or not.
Check if TF can access the GPUs using the command

value = tf.test.is_gpu_available(
    cuda_only=False,
    min_cuda_compute_capability=None
)
print ('***If TF can access GPU: ***\n\n',value) # MUST RETURN True IF IT CAN!!

I assume first 2 points are already taken care of by you. If TF can also access your GPUs then, as you can see in your Value error, it has names of GPUs actually. I can not say about tf.keras.utils.multi_gpu_model() function because I did not use it in TF. But I would suggest you to use with tf.device('/gpu:0'):. Inside this you call your model or define the model.
If point 4 also doesn't work, then just add following lines

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3" # 0,1,2,3 are number of GPUs

at the top of your file and remove with tf.device('/gpu:0')

回答2:

CUDA should be the version 10.0, not 10.1

来源：https://stackoverflow.com/questions/57728052/tensorflow-2-0rc-not-detecting-gpus

标签

python

keras

tensorflow2.0