What is the intuition of using tanh in LSTM

后端 未结 2 943
执笔经年
执笔经年 2020-12-22 15:49

In LSTM Network (Understanding LSTMs), Why input gate and output gate use tanh? what is the intuition behind this? it is just a nonlinear transformation? if it is, can I c

2条回答
  •  滥情空心
    2020-12-22 16:21

    LSTMs manage an internal state vector whose values should be able to increase or decrease when we add the output of some function. Sigmoid output is always non-negative; values in the state would only increase. The output from tanh can be positive or negative, allowing for increases and decreases in the state.

    That's why tanh is used to determine candidate values to get added to the internal state. The GRU cousin of the LSTM doesn't have a second tanh, so in a sense the second one is not necessary. Check out the diagrams and explanations in Chris Olah's Understanding LSTM Networks for more.

    The related question, "Why are sigmoids used in LSTMs where they are?" is also answered based on the possible outputs of the function: "gating" is achieved by multiplying by a number between zero and one, and that's what sigmoids output.

    There aren't really meaningful differences between the derivatives of sigmoid and tanh; tanh is just a rescaled and shifted sigmoid: see Richard Socher's Neural Tips and Tricks. If second derivatives are relevant, I'd like to know how.

提交回复
热议问题