how to enable GPU visible for ML runtime environment on databricks?

隐身守侯 提交于 2021-01-29 08:32:10

问题


I am trying to run some TensorFlow (2.2) example code on databricks/GPU (p2.xlarge) with environment as:

6.6 ML, spark 2.4.5, GPU, Scala 2.11  
Keras version : 2.2.5

nvidia-smi
NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2         

I have checked https://docs.databricks.com/applications/deep-learning/single-node-training/tensorflow.html#install-tensorflow-22-on-databricks-runtime-66-ml&language-GPU

But, I do not want to run the shell commands every time the databricks GPU clusters is restarted.

so, I installed TensorFlow from databricks libs UI by

  tensorflow==2.2.*

I do not indicate it is for GPU or CPU. I assume that it is for GPU by default.

I found that the python3 code is only run on CPUs not on GPU.

  import tensorflow as tf

  physical_devices = tf.config.list_physical_devices()
  physical_devices : [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'), PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU')]


  visible_devices = tf.config.get_visible_devices()

  visible devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

  tf.test.gpu_device_name() # return empty string


  is_built_with_cuda: True
  is_built_with_gpu_support: True
  is_built_with_rocm: False
  is_built_with_xla: True
  get_soft_device_placement : True

I am trying to set the 'XLA_GPU' visible to the ML runtime:

# https://www.tensorflow.org/api_docs/python/tf/config/set_visible_devices
# set GPU visible for TF runtime
physical_devices = tf.config.list_physical_devices('XLA_GPU')
try:
    # enable first GPU
    tf.config.set_visible_devices(physical_devices[0], 'XLA_GPU') # exception here !!!
    logical_devices = tf.config.list_logical_devices('XLA_CPU')
    # Logical device was created for first GPU
    assert len(logical_devices) == len(physical_devices) 
except:
    # Invalid device or cannot modify virtual devices once initialized.
    print('Invalid device or cannot modify virtual devices once initialized.')

But, I got exception.

How to enable GPU so that TF code can run on it ?

thanks


回答1:


Install tensorflow-gpu instead of tensorflow, as that will run primarily on gpu while tensorflow will run primarily on cpu. You won't need to edit the code as it still imports by the alias tensorflow



来源:https://stackoverflow.com/questions/62489900/how-to-enable-gpu-visible-for-ml-runtime-environment-on-databricks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!