What is the intuition of using tanh in LSTM
问题 In LSTM Network (Understanding LSTMs), Why input gate and output gate use tanh? what is the intuition behind this? it is just a nonlinear transformation? if it is, can I change both to another activation function (e.g. ReLU)? 回答1: Sigmoid specifically, is used as the gating function for the 3 gates(in, out, forget) in LSTM , since it outputs a value between 0 and 1, it can either let no flow or complete flow of information throughout the gates. On the other hand, to overcome the vanishing