问题:

I've tried a bunch of different Tensorflow examples, which works fine on the CPU but generates the same error when I'm trying to run them on the GPU. One little example is this:

import tensorflow as tf  # Creates a graph. a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) # Creates a session with log_device_placement set to True. sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # Runs the op. print sess.run(c)

The error is always the same, CUDA_ERROR_OUT_OF_MEMORY:

I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 24 I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 0 with properties:  name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:0a:00.0 Total memory: 11.25GiB Free memory: 105.73MiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 1 with properties:  name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:0b:00.0 Total memory: 11.25GiB Free memory: 133.48MiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 1  I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0:   Y Y  I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 1:   Y Y  I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:0a:00.0) I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:0b:00.0) I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 105.48MiB bytes. E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 105.48M (110608384 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Check failed: gpu_mem != nullptr  Could not allocate GPU device memory for device 0. Tried to allocate 105.48MiB Aborted (core dumped)

I guess that the problem has to do with my configuration rather than the memory usage of this tiny example. Does anyone have any idea?

Edit:

I've found out that the problem may be as simple as someone else running a job on the same GPU, which would explain the little amount of free memory. In that case: sorry for taking up your time...

回答1:

There appear to be two issues here:

By default, TensorFlow allocates a large fraction (95%) of the available GPU memory (on each GPU device) when you create a tf.Session. It uses a heuristic that reserves 200MB of GPU memory for "system" uses, but doesn't set this aside if the amount of free memory is smaller than that.
It looks like you have very little free GPU memory on either of your GPU devices (105.73MiB and 133.48MiB). This means that TensorFlow will attempt to allocate memory that should probably be reserved for the system, and hence the allocation fails.

Is it possible that you have another TensorFlow process (or some other GPU-hungry code) running while you attempt to run this program? For example, a Python interpreter with an open session―even if it is not using the GPU―will attempt to allocate almost the entire GPU memory.

Currently, the only way to restrict the amount of GPU memory that TensorFlow uses is the following configuration option (from this question):

# Assume that you have 12GB of GPU memory and want to allocate ~4GB: gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)  sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

转载请标明出处:Error using Tensorflow with GPU

文章来源: Error using Tensorflow with GPU

标签

tensorflow

memory

runtime