Keras LSTM multidimensional output error — expected time_distributed_17 to have 3 dimensions, but got array with shape (1824, 3)

问题

I am trying to predict multidimensional values in sequence, e.g. [[0, 0, 2], [1, 0, 3], [2, 3, 4], [3, 2, 5], [4, 0, 6], [5, 0, 7] ... ] and want each of the [x, y, z] dimensions to be captured by the LSTM.

When I attempt to run model.fit() on the model below, I get the error in the title,

ValueError: Error when checking target: expected time_distributed_19 to have 3 dimensions, but got array with shape (1824, 3)

I know the output layer should have three dimensions, but I'm getting confused in my thinking about how I need the LSTM to deal with my sequence of n-dimensional values.

Here is my model. Note that if I uncomment the Flatten() line as some solutions suggest, I get a nondescript AssertionError on model.compile()

# X shape: (1824, 256, 3)
# Y shape: (1824, 3)

model = Sequential()

model.add(LSTM(units=128, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(units=128, return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(units=128, return_sequences=True))
model.add(Dropout(0.2))

# model.add(Flatten())

model.add(TimeDistributed(Dense(Y.shape[1], activation='softmax')))

model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

Here is the model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_145 (LSTM)              (None, 256, 128)          67584     
_________________________________________________________________
dropout_140 (Dropout)        (None, 256, 128)          0         
_________________________________________________________________
lstm_146 (LSTM)              (None, 256, 128)          131584    
_________________________________________________________________
dropout_141 (Dropout)        (None, 256, 128)          0         
_________________________________________________________________
time_distributed_19 (TimeDis (None, 256, 3)            387       
=================================================================
Total params: 199,555
Trainable params: 199,555
Non-trainable params: 0
_________________________________________________________________
None

This model was running before I added the TimeDistributed() wrapper (though I had to remove return_sequences=True from the last hidden layer for it to work), but I added TimeDistributed() because I don't think the individual variables of my 3-dimensional feature values were being captured.

Any insight is greatly appreciated, thank you.

UPDATE

Thanks to nuric's quick answer to my initial question, I confirmed that the way I was previously doing it was the "right way" and my confusion stems from the predictions I'm getting. Given a sequence from X, I get a 3D vector like this: [9.915069e-01 1.084390e-04 8.384804e-03] (and it's always about [1, 0, 0])

In my previous LSTM models, this prediction vector's max value corresponded to the index in my one-hot encoding of letters/words, but here what I want is predictions for the x, y, and z values of the next 3D vector in the sequence.

回答1:

You have a mismatch in what the model predicts, currently 3D, and what the target is, 2D. You have 2 options:

Apply Flatten and remove TimeDistributed which means the model will predict based on the entire sequence.
Remove return_sequences=True from last LSTM to let the LSTM compress the sequence and again remove TimeDistributed. This way the model will predict based on the last LSTM output not the sequences.

I would prefer the second option given the size of the sequence and the number of hidden units you have. Option one will create a very large kernel for the Dense layer if you just flatten the sequence, i.e. too many parameters.

来源：https://stackoverflow.com/questions/51014044/keras-lstm-multidimensional-output-error-expected-time-distributed-17-to-have

标签

python

tensorflow

keras

lstm