问题
TF2 is currently not detecting GPUs, I migrated from TF1.14 where using
tf.keras.utils.multi_gpu_model(model=model, gpus=2)
is now returning an error
ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0', '/xla_gpu:0', '/xla_gpu:1', '/xla_gpu:2', '/xla_gpu:3']. Try reducing `gpus`.
Running nvidia-smi
returns the following information
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:09:00.0 Off | 0 |
| N/A 46C P0 62W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:0A:00.0 Off | 0 |
| N/A 36C P0 71W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:86:00.0 Off | 0 |
| N/A 38C P0 58W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:87:00.0 Off | 0 |
| N/A 31C P0 82W / 149W | 0MiB / 11441MiB | 73% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Also my TF version and is built for cuda
2.0.0-rc0
Please let me know what I am doing wrong so I can fix it.
回答1:
I would suggest you to-
Please first check your Cuda version. Make sure it is 10.0.
If that is 10.0, then check your TF version if it is for GPU or not.
Check if TF can access the GPUs using the command
value = tf.test.is_gpu_available(
cuda_only=False,
min_cuda_compute_capability=None
)
print ('***If TF can access GPU: ***\n\n',value) # MUST RETURN True IF IT CAN!!
- I assume first 2 points are already taken care of by you. If TF can also access your GPUs then, as you can see in your
Value error
, it has names of GPUs actually. I can not say abouttf.keras.utils.multi_gpu_model()
function because I did not use it in TF. But I would suggest you to usewith tf.device('/gpu:0'):
. Inside this you call yourmodel
or define the model. - If point 4 also doesn't work, then just add following lines
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3" # 0,1,2,3 are number of GPUs
at the top of your file and remove with tf.device('/gpu:0')
回答2:
CUDA should be the version 10.0, not 10.1
来源:https://stackoverflow.com/questions/57728052/tensorflow-2-0rc-not-detecting-gpus