I\'m writing some basic neural network methods - specifically the activation functions - and have hit the limits of my rubbish knowledge of math. I understand the respective ran
The word is (and I've tested) that in some cases it might be better to use the tanh than the logistic since
Y = 0 on the logistic times a weight w yields a value near 0 which doesn't have much effect on the upper layers which it affects (although absence also affects), however a value near Y = -1 on tahn times a weight w might yield a large number which has more numeric effect.1 - y^2) yields values greater than the logistic (y (1 -y) = y - y^2). For example, when z = 0, the logistic function yields y = 0.5 and y' = 0.25, for tanh y = 0 but y' = 1 (you can see this in general just by looking at the graph). MEANING that a tanh layer might learn faster than a logistic layer because of the magnitude of the gradient.