问题
Can someone explain how can I initialize hidden state of LSTM in tensorflow? I am trying to build LSTM recurrent auto-encoder, so after i have that model trained i want to transfer learned hidden state of unsupervised model to hidden state of supervised model. Is that even possible with current API? This is paper I am trying to recreate:
http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf
回答1:
Yes - this is possible but truly cumbersome. Let's go through an example.
Defining a model:
from keras.layers import LSTM, Input from keras.models import Model input = Input(batch_shape=(32, 10, 1)) lstm_layer = LSTM(10, stateful=True)(input) model = Model(input, lstm_layer) model.compile(optimizer="adam", loss="mse")
It's important to build and compile model first as in compilation the initial states are reset. Moreover - you need to specify a
batch_shape
wherebatch_size
is specified as in this scenario our network should bestateful
(which is done by setting astateful=True
mode.Now we could set the values of initial states:
import numpy import keras.backend as K hidden_states = K.variable(value=numpy.random.normal(size=(32, 10))) cell_states = K.variable(value=numpy.random.normal(size=(32, 10))) model.layers[1].states[0] = hidden_states model.layers[1].states[1] = cell_states
Note that you need to provide states as a
keras
variables.states[0]
holds hidden states andstates[1]
holds cell states.
Hope that helps.
回答2:
Assuming an RNN is in layer 1 and hidden/cell states are numpy arrays. You can do this:
from keras import backend as K
K.set_value(model.layers[1].states[0], hidden_states)
K.set_value(model.layers[1].states[1], cell_states)
States can also be set using
model.layers[1].states[0] = hidden_states
model.layers[1].states[1] = cell_states
but when I did it this way my state values stayed constant even after stepping the RNN.
回答3:
I used this approach, totally worked out for me:
lstm_cell = LSTM(cell_num, return_state=True)
output, h, c = lstm_cell(input, initial_state=[h_prev, c_prev])
来源:https://stackoverflow.com/questions/42415909/initializing-lstm-hidden-state-tensorflow-keras