问题
I'm trying to build a seq2seq model in tensorflow (1.4) using the tf.contrib.rnn.ConvLSTMCell API together with the tf.nn.dynamic_rnn API, but I got an error with the dimension of the inputs.
My code is:
# features is an image sequence with shape [600, 400, 10],
# so features is a tensor with shape [batch_size, 600, 400, 10]
features = tf.transpose(features, [0,3,1,2])
features = tf.reshape(features, [params['batch_size'],10,600,400])
encoder_cell = tf.contrib.rnn.ConvLSTMCell(conv_ndims=2,
input_shape=[600, 400,1],
output_channels=5,
kernel_shape=[7,7],
skip_connection=False)
_, encoder_state = tf.nn.dynamic_rnn(cell=encoder_cell,
inputs=features,
sequence_length=[10]*params['batch_size'],
dtype=tf.float32)
I get the following error
ValueError: Conv Linear expects all args to be of same Dimension: [[2, 600, 400], [2, 600, 400, 5]]
Looking at the tf implementation, it seems that the inputs to dynamic_rnn is only 3-dimensional in contrary to the hidden state, which is 4-dimensional. I tried to pass the input as a nested tuple, but it didn't work.
The problem is similar to TensorFlow dynamic_rnn regressor: ValueError dimension mismatch, it's slightly different though, as they're using a plain LSTMCell (which worked for me).
Can anyone give me a minimal example how to use these 2 APIs together? Thanks!
回答1:
As I understand from here https://github.com/iwyoo/ConvLSTMCell-tensorflow/issues/2 Currently, tf.nn.dynamic_rnn doesn't support ConvLSTMCell.
Therefore, as described here, https://github.com/iwyoo/ConvLSTMCell-tensorflow/issues/1 you have to manually create the RNN.
An example is provided in the documentation, https://github.com/iwyoo/ConvLSTMCell-tensorflow/blob/master/README.md
Below I have modified your code according to the above example with the comments where necessary.
height = 400
width = 400
time_steps = 25
channel = 10
batch_size = 2
p_input = tf.placeholder(tf.float32, [None, height, width, time_steps, channel])
p_label = tf.placeholder(tf.float32, [None, height, width, 3])
p_input_list = tf.split(p_input, step_size, 3) # creates a list of leghth time_steps and one elemnt has the shape of (?, 400, 400, 1, 10)
p_input_list = [tf.squeeze(p_input_, [3]) for p_input_ in p_input_list] #remove the third dimention now one list elemnt has the shape of (?, 400, 400, 10)
cell = tf.contrib.rnn.ConvLSTMCell(conv_ndims=2, # ConvLSTMCell definition
input_shape=[height, width, channel],
output_channels=5,
kernel_shape=[7, 7],
skip_connection=False)
state = cell.zero_state(batch_size, dtype=tf.float32) #initial state is zero
with tf.variable_scope("ConvLSTM") as scope: # as BasicLSTMCell # create the RNN with a loop
for i, p_input_ in enumerate(p_input_list):
if i > 0:
scope.reuse_variables()
# ConvCell takes Tensor with size [batch_size, height, width, channel].
t_output, state = cell(p_input_, state)
Notice that you have to input an image that has the same height and width. If your height and width doesn't match, then you may have to do padding.
hope this helps.
回答2:
In the meantime I figured out how to use the 2 APIs together. The trick is to pass a 5D-Tensor as input to tf.nn.dynamic_rnn(), where the last dimension is the size of the "vector on the spatial grid" (which comes from the transformation of the input from 2D to 3D, inspired by the paper on which the implementation is based: https://arxiv.org/pdf/1506.04214.pdf). In my case the vector size is 1, I have to expand the dimension anyway though.
While fixing this error another issue emerged: In the paper mentioned above in section 3.1 they state the equations for the convLSTM. They use the Hadamard-product for weights connected to the cell outputs C. Printing the weights of my ConvLSTMCell in Tensorflow, it seems like they don't use the weights Wci, Wcf and Wco at all. So, can anybody tell me the exact implementation of the TF ConvLSTMCell?
Btw. the output of the tensorflow ConvSTMCell is C or H (in the notation of the paper)?
来源:https://stackoverflow.com/questions/47459225/valueerror-convlstmcell-and-dynamic-rnn