How to create end execute a basic LSTM network in TensorFlow?

问题

I want to create a basic LSTM network that accept sequences of 5 dimensional vectors (for example as a N x 5 arrays) and returns the corresponding sequences of 4 dimensional hidden- and cell-vectors (N x 4 arrays), where N is the number of time steps.

How can I do it TensorFlow?

ADDED

So, far I got the following code working:

num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)

timesteps = 18
num_input = 5
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()

However, there are many open questions:

Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?
Why do we split data by time-steps (using unstack)?
How to interpret the "outputs" and "states"?

回答1:

Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?

If you want to accept sequences of arbitrary length, I recommend using dynamic_rnn.You can refer here to understand the difference between them.

For example:

num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)

num_input = 5
X = tf.placeholder("float", [None, None, num_input])
outputs, states = tf.nn.dynamic_rnn(lstm, X, dtype=tf.float32)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})

x_val = np.random.normal(size = (12,16,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()

dynamic_rnn require same length in one batch , but you can specify every length using the sequence_length parameter after you pad batch data when you need arbitrary length in one batch.

We do we split data by time-steps (using unstack)?

Just static_rnn needs to split data with unstack,this depending on their different input requirements. The input shape of static_rnn is [timesteps,batch_size, features], which is a list of 2D tensors of shape [batch_size, features]. But the input shape of dynamic_rnn is either [timesteps,batch_size, features] or [batch_size,timesteps, features] depending on time_major is True or False.

How to interpret the "outputs" and "states"?

The shape of states is [2,batch_size,num_units ] in LSTMCell, one [batch_size, num_units ] represents C and the other [batch_size, num_units ] represents h. You can see pictures below.

In the same way, You will get the shape of states is [batch_size, num_units ] in GRUCell.

outputs represents the output of each time step, so by default(time_major=False) its shape is [batch_size, timesteps, num_units]. And You can easily conclude that state[1, batch_size, : ] == outputs[ batch_size, -1, : ].

来源：https://stackoverflow.com/questions/54764500/how-to-create-end-execute-a-basic-lstm-network-in-tensorflow

标签

python

tensorflow

lstm