问题
I made a tensorflow model with relatively common operations (apart from a couple of tf.where
and indices handling), but call it with very varying different input shapes (many undefined tensor shapes in the model).
Everything works fine on the CPU. But when you use the GPU, the RAM usage (not the GPU memory, the CPU one) steadily increases up to fill the 256GB of the machine and kills itself.
During the process, I get the usual messages :
2017-03-17 16:42:22.366601: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 18347 get requests, put_count=18345 evicted_count=1000 eviction_rate=0.0545108 and unsatisfied allocation rate=0.0763068
2017-03-17 16:42:22.366680: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 4385 to 4823
Which as far as I understand is the pool allocator for some DMA memory for the GPU. The problem is that it seems to never be satisfied with the eviction rate it gets and never ends allocating more space for itself.
Is this normal behavior? Are they ways to control this? Right now, I can not train a model for longer than 1h before running out of memory.
Note: I use the nigthly build version of TF, because of some bugfixes necessary for my current model to run. Also, no operations are added during training because I called tf.get_default_graph().finalize()
EDIT : tried to run with tcmalloc
instead of malloc
. Did not help. I also used the memory profiler and it is not saying there is a memory leak, memory usage stabilizing at 500MB for tcmalloc even if the memory usage in top
is way higher and the program eventually run OOM.
So why is the tcmalloc
profiler not agreeing with the memory usage I see in top
?
EDIT 2 : recompiled TF with changed hardcoded params to make it "work". See here
回答1:
This specific problem was solved some times ago by the TF team when they changed their memory allocator (see the Corresponding issue on github).
If you encounter a growth in memory during training, a common mistake is that nodes are being added to the graph during the training (TF is not numpy, unless you use eager execution). Make sure to call graph.finalize() before your training loop to ensure no nodes are added during the training process, this allows to catch many memory growth issues.
来源:https://stackoverflow.com/questions/42861956/gpu-poolallocator-explodes-the-cpu-memory