How to restrict tensorflow GPU memory usage?

问题

I have used tensorflow-gpu 1.13.1 in Ubuntu 18.04 with CUDA 10.0 on Nvidia GeForce RTX 2070 (Driver Version: 415.27).

Code like below was used to manage tensorflow memory usage. I have about 8Gb GPU memory, so tensorflow mustn't allocate more than 1Gb of GPU memory. But when I look on memory usage with nvidia-smi command, I see, that it uses ~1.5 Gb despite the fact that I restricted memory quantity with GPUOptions.

memory_config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.12))

memory_config.gpu_options.allow_growth = False

with tf.Session(graph=graph, config=memory_config) as sess:
    output_dict = sess.run(tensor_dict,
                           feed_dict={image_tensor: np.expand_dims(image, 0)})

Why is it going? And how I can avoid this or at least calculate memory needs for every session? I need to make strong restrictions for every process, because I have several paralell instances with different sessions, so I need to be sure, that there will no resource race

BTW, I have tried to set memory_config.gpu_options.allow_growth to False, but it affect nothing. Tensorflow is still allocate memory the same way independently from this flag value. And it's also seems strange

回答1:

Solution Try with gpu_options.allow_growth = True to see how much default memory is consumed in tf.Session creation. That memory will be always allocated regardless of values.

Based on your result, it should be somewhere less than 500MB. So if you want each process to truly have 1GB of memory each, calculate:

(1GB minus default memory)/total_memory

Reason

When you create a tf.Session, regardless of your configuration, Tensorflow device is created on GPU. And this device requires some minimum memory.

import tensorflow as tf

conf = tf.ConfigProto()
conf.gpu_options.allow_growth=True
session = tf.Session(config=conf)

Given allow_growth=True, there should be no gpu allocation. However in reality, it yields:

2019-04-05 18:44:43.460479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15127 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:03:00.0, compute capability: 6.0)

which occupies small fraction of memory (in my past experience, the amount differs by gpu models). NOTE: setting allow_growth occupies almost same memory as setting per_process_gpu_memory=0.00001, but latter won't be able to create session properly.

In this case, it is 345MB :

That is the offset you are experiencing. Let's take a look in case of per_process_gpu_memory:

conf = tf.ConfigProto()
conf.gpu_options.per_process_gpu_memory_fraction=0.1
session = tf.Session(config=conf)

Since the gpu has 16,276MB of memory, setting per_process_gpu_memory_fraction = 0.1 probably makes you think only about 1,627MB will be allocated. But the truth is:

1,971MB is allocated, which however coincides with sum of default memory (345MB) and expected memory (1,627MB).

来源：https://stackoverflow.com/questions/55531944/how-to-restrict-tensorflow-gpu-memory-usage

标签

python

tensorflow

gpu