In MNIST LSTM examples, I don\'t understand what \"hidden layer\" means. Is it the imaginary-layer formed when you represent an unrolled RNN over time?
Why is the <
This term num_units
or num_hidden_units
sometimes noted using the variable name nhid
in the implementations, means that the input to the LSTM cell is a vector of dimension nhid
(or for a batched implementation, it would a matrix of shape batch_size
x nhid
). As a result, the output (from LSTM cell) would also be of same dimensionality since RNN/LSTM/GRU cell doesn't alter the dimensionality of the input vector or matrix.
As pointed out earlier, this term was borrowed from Feed-Forward Neural Networks (FFNs) literature and has caused confusion when used in the context of RNNs. But, the idea is that even RNNs can be viewed as FFNs at each time step. In this view, the hidden layer would indeed be containing num_hidden
units as depicted in this figure:
Source: Understanding LSTM
More concretely, in the below example the num_hidden_units
or nhid
would be 3 since the size of hidden state (middle layer) is a 3D vector.