Perhaps too general a question, but can anyone explain what would cause a Convolutional Neural Network to diverge?
Specifics:
I am using Tensorflow\'s iris_tra
If you're training for cross entropy, you want to add a small number like 1e-8 to your output probability.
Because log(0) is negative infinity, when your model trained enough the output distribution will be very skewed, for instance say I'm doing a 4 class output, in the beginning my probability looks like
0.25 0.25 0.25 0.25
but toward the end the probability will probably look like
1.0 0 0 0
And you take a cross entropy of this distribution everything will explode. The fix is to artifitially add a small number to all the terms to prevent this.