What is the intuition of using tanh in LSTM

后端未结

关注

 2  943

执笔经年 2020-12-22 15:49

In LSTM Network (Understanding LSTMs), Why input gate and output gate use tanh? what is the intuition behind this? it is just a nonlinear transformation? if it is, can I c

2条回答

滥情空心 (楼主)

2020-12-22 16:21

LSTMs manage an internal state vector whose values should be able to increase or decrease when we add the output of some function. Sigmoid output is always non-negative; values in the state would only increase. The output from tanh can be positive or negative, allowing for increases and decreases in the state.

That's why tanh is used to determine candidate values to get added to the internal state. The GRU cousin of the LSTM doesn't have a second tanh, so in a sense the second one is not necessary. Check out the diagrams and explanations in Chris Olah's Understanding LSTM Networks for more.

The related question, "Why are sigmoids used in LSTMs where they are?" is also answered based on the possible outputs of the function: "gating" is achieved by multiplying by a number between zero and one, and that's what sigmoids output.

There aren't really meaningful differences between the derivatives of sigmoid and tanh; tanh is just a rescaled and shifted sigmoid: see Richard Socher's Neural Tips and Tricks. If second derivatives are relevant, I'd like to know how.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...