What is the architecture behind the Keras LSTM Layer implementation?

问题

How does the input dimensions get converted to the output dimensions for the LSTM Layer in Keras? From reading Colah's blog post, it seems as though the number of "timesteps" (AKA the input_dim or the first value in the input_shape) should equal the number of neurons, which should equal the number of outputs from this LSTM layer (delineated by the units argument for the LSTM layer).

From reading this post, I understand the input shapes. What I am baffled by is how Keras plugs the inputs into each of the LSTM "smart neurons".

Keras LSTM reference

Example code that baffles me:

model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(Dense(2))

From this, I would think that the LSTM layer has 10 neurons and each neuron is fed a vector of length 64. However, it seems it has 32 neurons and I have no idea what is being fed into each. I understand that for the LSTM to connect to the Dense layer, we can just plug all 32 outputs to each of the 2 neurons. What confuses me is the InputLayer to the LSTM.

(similar SO post but not quite what I need)

回答1:

I was correct! The architecture is 10 neurons, each representing a time-step. Each neuron is being fed a 64 length vector, representing 64 features (the input_dim).

The 32 represents the number of hidden states or the "hidden unit length". It represents how many hidden states there are and also represents the output dimension (since we output the hidden state at the end of the LSTM).

Lastly, the 32-dimensional output vector from the last time-step is then fed to a Dense layer of 2 neurons, which basically means plug the 32 length vector to both neurons.

More reading with somewhat helpful answers:

Understanding Keras LSTMs
What exactly am I configuring when I create a stateful LSTM layer with N units
Initializing LSTM hidden states with Keras

回答2:

@Sticky, you are wrong in your interpretation. Input_shape =(batch_size,sequence_length/timesteps,feature_size).So, your input tensor is 10x64 (like 10 words and its 64 features.Just like word embedding).32 are neurons to make output vector size 32.

The output will have shape structure:

(batch, arbitrary_steps, units) if return_sequences=True.
(batch, units) if return_sequences=False.
The memory states will have a size of "units".

回答3:

I dont think you are right. Actually timestep number does not impact the number of parameters in LSTM.

from keras.layers import LSTM
from keras.models import Sequential

time_step = 13
featrue = 5
hidenfeatrue = 10

model = Sequential()
model.add(LSTM(hidenfeatrue, input_shape=(time_step, featrue)))
model.summary()

time_step=100
model2 = Sequential()
model2.add(LSTM(hidenfeatrue, input_shape=(time_step, featrue)))
model2.summary()

the reuslt:

Using TensorFlow backend.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 10)                640       
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_2 (LSTM)                (None, 10)                640       
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
_________________________________________________________________

来源：https://stackoverflow.com/questions/49892528/what-is-the-architecture-behind-the-keras-lstm-layer-implementation

标签

python

keras

lstm