问题
I want to create a basic LSTM network that accept sequences of 5 dimensional vectors (for example as a N x 5 arrays) and returns the corresponding sequences of 4 dimensional hidden- and cell-vectors (N x 4 arrays), where N is the number of time steps.
How can I do it TensorFlow?
ADDED
So, far I got the following code working:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
timesteps = 18
num_input = 5
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
However, there are many open questions:
- Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?
- Why do we split data by time-steps (using unstack)?
- How to interpret the "outputs" and "states"?
回答1:
Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?
If you want to accept sequences of arbitrary length, I recommend using dynamic_rnn
.You can refer here to understand the difference between them.
For example:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
num_input = 5
X = tf.placeholder("float", [None, None, num_input])
outputs, states = tf.nn.dynamic_rnn(lstm, X, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
x_val = np.random.normal(size = (12,16,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
dynamic_rnn
require same length in one batch , but you can specify every length using the sequence_length
parameter after you pad batch data when you need arbitrary length in one batch.
We do we split data by time-steps (using unstack)?
Just static_rnn
needs to split data with unstack
,this depending on their different input requirements. The input shape of static_rnn
is [timesteps,batch_size, features]
, which is a list of 2D tensors of shape [batch_size, features]
. But the input shape of dynamic_rnn
is either [timesteps,batch_size, features]
or [batch_size,timesteps, features]
depending on time_major
is True or False.
How to interpret the "outputs" and "states"?
The shape of states
is [2,batch_size,num_units ]
in LSTMCell, one [batch_size, num_units ]
represents C and the other [batch_size, num_units ]
represents h. You can see pictures below.
In the same way, You will get the shape of states
is [batch_size, num_units ]
in GRUCell.
outputs
represents the output of each time step, so by default(time_major=False) its shape is [batch_size, timesteps, num_units]
. And You can easily conclude that
state[1, batch_size, : ] == outputs[ batch_size, -1, : ]
.
来源:https://stackoverflow.com/questions/54764500/how-to-create-end-execute-a-basic-lstm-network-in-tensorflow