Sometimes I run into a problem:
OOM when allocating tensor with shape
e.q.
OOM when allocating tensor wi
Use the summaries provided by pytorchsummary (pip install) or keras (builtin).
E.g.
from torchsummary import summary
summary(model)
.....
.....
================================================================
Total params: 1,127,495
Trainable params: 1,127,495
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.02
Forward/backward pass size (MB): 13.93
Params size (MB): 4.30
Estimated Total Size (MB): 18.25
----------------------------------------------------------------
Each instance you put in the batch will require a full forward/backward pass in memory, your model you only need once. People seem to prefer batch sizes of powers of two, probably because of automatic layout optimization on the gpu.
Don't forget to linearly increase your learning rate when increasing the batch size.
Let's assume we have a Tesla P100 at hand with 16 GB memory.
(16000 - model_size) / (forward_back_ward_size)
(16000 - 4.3) / 18.25 = 1148.29
rounded to powers of 2 results in batch size 1024