Sometimes I run into a problem:
OOM when allocating tensor with shape
e.q.
OOM when allocating tensor wi
You can estimate the largest batch size using:
Max batch size= available GPU memory bytes / 4 / (size of tensors + trainable parameters)