Why gradient of tanh in tensorflow is `grad = dy * (1 - y*y)`

倖福魔咒の 提交于 2021-02-08 04:46:24

问题


tf.raw_ops.TanhGrad says that grad = dy * (1 - y*y), where y = tanh(x).

But I think since dy / dx = 1 - y*y, where y = tanh(x), grad should be dy / (1 - y*y). Where am I wrong?


回答1:


An expression like dy / dx is a mathematical notation for the derivative, it is not an actual fraction. It is meaningless to move dy or dx around individually as you would with a numerator and denominator.

Mathematically, it is known that d(tanh(x))/dx = 1 - (tanh(x))^2. TensorFlow computes gradients "backwards" (what is called backpropagation, or more generally reverse automatic differentiation). That means that, in general, we will reach the computation of the gradient of tanh(x) after reaching the step where we compute the gradient of an "outer" function g(tanh(x)). g represents all the operations that are applied to the output of tanh to reach the value for which the gradient is computed. The derivative of this function g, according to the chain rule, is d(g(tanh(x)))/dx = d(g(tanh(x))/d(tanh(x)) * d(tanh(x))/dx. The first factor, d(g(tanh(x))/d(tanh(x)), is the reverse accumulated gradient up until tanh, that is, the derivate of all those later operations, and is the value of dy in the documentation of the function. Therefore, you only need to compute d(tanh(x))/dx (which is (1 - y * y), because y = tanh(x)) and multiply it by the given dy. The resulting value will then be propagated further back to the operation that produced the input x to tanh in the first place, and it will become the dy value in the computation of that gradient, and so on until the gradient sources are reached.



来源:https://stackoverflow.com/questions/62634073/why-gradient-of-tanh-in-tensorflow-is-grad-dy-1-yy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!