I\'m training the Keras object detection model linked at the bottom of this question, although I believe my problem has to do neither with Keras nor with the specific model I\'m
I would add gradient clipping because this prevents spikes in the gradients to mess up the parameters during training.
Gradient Clipping is a technique to prevent exploding gradients in very deep networks, typically Recurrent Neural Networks.
Most programs allows you to add a gradient clipping parameter to your GD based optimizer.