TensorFlow CUDA_ERROR_OUT_OF_MEMORY

荒凉一梦 提交于 2020-12-04 18:05:49

问题


I'm trying to build a large CNN in TensorFlow, and intend to run it on a multi-GPU system. I've adopted a "tower" system and split batches for both GPUs, while keeping the variables and other computations on the CPU. My system has 32GB of memory, but when I run my code I get the error:

E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 17179869184
Killed

I've seen that the code works (though very very slowly) if I hide CUDA devices to TensorFlow, and thus it doesn't use cudaMallocHost()...

Thank you for your time.


回答1:


There are some options:

1- reduce your batch size

2- use memory growing:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

3- don't allocate whole of your GPU memory(only 90%):

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
session = tf.Session(config=config, ...)



回答2:


reduce the batch_size in your code to 100 then it'll work



来源:https://stackoverflow.com/questions/43503409/tensorflow-cuda-error-out-of-memory

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!