How to determine maximum batch size for a seq2seq tensorflow RNN training model

Currently, I am using the default 64 as the batch size for the seq2seq tensorflow model. What is the maximum batch size , layer size etc I can go with a single Titan X GPU with 12 GB RAM with Haswell-E xeon 128GB RAM. The input data is converted to embeddings. Following are some helpful parameters I am using , it seems the cell input size is 1024:

encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
 decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
 tf.app.flags.DEFINE_integer("size", 1024, "Size of each model layer.")

So based on my hardware what is the maximum batch size , layers, input size I can go? Currently the GPU shows that 99% memory is occupied.

By default, Tensorflow occupies all GPU memory available. However, there is a way to change this. In my model, I do this:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True

Then you can use this config when you start your session:

with tf.Session(config=config) as sess:

Now, the model will only use as much memory as it needs, and then you can try with different batch sizes and see when it runs out of memory.

The memory usage when running a TensorFlow model depends on how many variables you have in your model, as well as the intermediate tensors that the TensorFlow run time uses to compute activations, gradients, etc. For instance, in your model, if the input_size is 1024, the memory used for variables per layer would be 4MB + 4KB (weights and biases). The memory used for intermediate tensors would grow linearly with the batch size, but the exact amount is hard to estimate, as it depends on how the run time decides to schedule the operations. 12GB should be able to fit quite a large model, though.

Elaborating a bit on the prior answer, it is difficult to analytically forecast the exact max RAM consumption of a model because the TF runtime has some freedom to schedule independent operations simultaneously, and doing so can result in higher max RAM use than executing the same ops sequentially. Op scheduling is dynamic, hence the maximum amount of RAM used in a training step can vary non-deterministically from step to step. In practice, for non-trivial models it seems necessary to experiment to find the largest batch size that will consistently work.

来源：https://stackoverflow.com/questions/35171405/how-to-determine-maximum-batch-size-for-a-seq2seq-tensorflow-rnn-training-model

标签

machine-learning

tensorflow

gpu-programming

recurrent-neural-network