How to restrict tensorflow GPU memory usage?

别来无恙 提交于 2020-01-14 14:33:53

问题


I have used tensorflow-gpu 1.13.1 in Ubuntu 18.04 with CUDA 10.0 on Nvidia GeForce RTX 2070 (Driver Version: 415.27).

Code like below was used to manage tensorflow memory usage. I have about 8Gb GPU memory, so tensorflow mustn't allocate more than 1Gb of GPU memory. But when I look on memory usage with nvidia-smi command, I see, that it uses ~1.5 Gb despite the fact that I restricted memory quantity with GPUOptions.

memory_config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.12))

memory_config.gpu_options.allow_growth = False

with tf.Session(graph=graph, config=memory_config) as sess:
    output_dict = sess.run(tensor_dict,
                           feed_dict={image_tensor: np.expand_dims(image, 0)})

Why is it going? And how I can avoid this or at least calculate memory needs for every session? I need to make strong restrictions for every process, because I have several paralell instances with different sessions, so I need to be sure, that there will no resource race

BTW, I have tried to set memory_config.gpu_options.allow_growth to False, but it affect nothing. Tensorflow is still allocate memory the same way independently from this flag value. And it's also seems strange


回答1:


Solution Try with gpu_options.allow_growth = True to see how much default memory is consumed in tf.Session creation. That memory will be always allocated regardless of values.

Based on your result, it should be somewhere less than 500MB. So if you want each process to truly have 1GB of memory each, calculate:

(1GB minus default memory)/total_memory

Reason

When you create a tf.Session, regardless of your configuration, Tensorflow device is created on GPU. And this device requires some minimum memory.

import tensorflow as tf

conf = tf.ConfigProto()
conf.gpu_options.allow_growth=True
session = tf.Session(config=conf)

Given allow_growth=True, there should be no gpu allocation. However in reality, it yields:

2019-04-05 18:44:43.460479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15127 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:03:00.0, compute capability: 6.0)

which occupies small fraction of memory (in my past experience, the amount differs by gpu models). NOTE: setting allow_growth occupies almost same memory as setting per_process_gpu_memory=0.00001, but latter won't be able to create session properly.

In this case, it is 345MB :

That is the offset you are experiencing. Let's take a look in case of per_process_gpu_memory:

conf = tf.ConfigProto()
conf.gpu_options.per_process_gpu_memory_fraction=0.1
session = tf.Session(config=conf)

Since the gpu has 16,276MB of memory, setting per_process_gpu_memory_fraction = 0.1 probably makes you think only about 1,627MB will be allocated. But the truth is:

1,971MB is allocated, which however coincides with sum of default memory (345MB) and expected memory (1,627MB).



来源:https://stackoverflow.com/questions/55531944/how-to-restrict-tensorflow-gpu-memory-usage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!