I am using a Softmax activation function in the last layer of a neural network. But I have problems with a safe implementation of this function.
A naive implementati
First go to log scale, i.e calculate log(y) instead of y. The log of the numerator is trivial. In order to calculate the log of the denominator, you can use the following 'trick': http://lingpipe-blog.com/2009/06/25/log-sum-of-exponentials/
log(y)
y