How to determine maximum batch size for a seq2seq tensorflow RNN training model

北城以北 提交于 2019-12-03 16:41:40

By default, Tensorflow occupies all GPU memory available. However, there is a way to change this. In my model, I do this:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True

Then you can use this config when you start your session:

with tf.Session(config=config) as sess:

Now, the model will only use as much memory as it needs, and then you can try with different batch sizes and see when it runs out of memory.

The memory usage when running a TensorFlow model depends on how many variables you have in your model, as well as the intermediate tensors that the TensorFlow run time uses to compute activations, gradients, etc. For instance, in your model, if the input_size is 1024, the memory used for variables per layer would be 4MB + 4KB (weights and biases). The memory used for intermediate tensors would grow linearly with the batch size, but the exact amount is hard to estimate, as it depends on how the run time decides to schedule the operations. 12GB should be able to fit quite a large model, though.

Elaborating a bit on the prior answer, it is difficult to analytically forecast the exact max RAM consumption of a model because the TF runtime has some freedom to schedule independent operations simultaneously, and doing so can result in higher max RAM use than executing the same ops sequentially. Op scheduling is dynamic, hence the maximum amount of RAM used in a training step can vary non-deterministically from step to step. In practice, for non-trivial models it seems necessary to experiment to find the largest batch size that will consistently work.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!