问题
I have a toy series dataset of 3-vectors in the form of
[[0, 0, 2], [1, 0, 3], [2, 0, 4], [3, 0, 2], [4, 0, 3], [5, 0, 4] ... [10001, 0, 4]]
x always goes up by one, y is always 0, z repeats 2, 3, 4. I want to predict the next 3-vector in the sequence given a starting sequence. I'm using a window size of 32, but have also tried 256 with identical results.
I normalize each dimension to be between 0 and 1 before sending it into the model. No matter how many layers, units, of number of features I add, the model doesn't get more accurate than about 0.5 and I'd like to understand why.
The prediction I get for the 33rd item is [4973.29 0.000 3.005]
whereas the real value is [32 0 4]
and I don't know if that's wrong because of the 0.5 accuracy or because of something else.
My model looks like this:
# X_modified shape: (9970, 32, 3)
# Y_modified shape: (9970, 3)
model = Sequential()
model.add(LSTM(units=128, input_shape=(X_modified.shape[1], X_modified.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=128, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=128))
model.add(Dropout(0.2))
model.add(Dense(Y_modified.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
Here's the summary and graphs:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_204 (LSTM) (None, 32, 128) 67584
_________________________________________________________________
dropout_199 (Dropout) (None, 32, 128) 0
_________________________________________________________________
lstm_205 (LSTM) (None, 32, 128) 131584
_________________________________________________________________
dropout_200 (Dropout) (None, 32, 128) 0
_________________________________________________________________
lstm_206 (LSTM) (None, 128) 131584
_________________________________________________________________
dropout_201 (Dropout) (None, 128) 0
_________________________________________________________________
dense_92 (Dense) (None, 3) 387
=================================================================
Total params: 331,139
Trainable params: 331,139
Non-trainable params: 0
_________________________________________________________________
None
Any insight is greatly appreciated, thank you!
回答1:
Right now your model is set up as a classifier but from your description it seems you are trying to solve a regression problem. Let me know if I am misunderstanding.
Try changing the activation on the final dense layer to 'linear'. Also change the loss function to 'mean_squared_error' or another regression loss. https://keras.io/losses/
You will not be able to get an accuracy score on a regression problem, instead you will see the mean squared error and any other regression metrics you add like 'mae' for mean average error which is useful for a more human readable error number.
You should be able to solve this with a small network so increasing the number of layers and units is not necessary.
In response to you comment:
If the timseries don't interact with each other then there isnt really any reason to predict them at the same time, so you'll have to decide that first. Here is how you could change them to classification problems of you want.
Based on you description I can't see a way to frame the X axis as a classification problem since it is just an increasing number.
For the Y axis you could have the network predict whether the next point will be a zero or not. So you would want the labels for this axis to be either 0 or 1 depending on whether the point is 0 or not. The final activation would be a dense layer with 1 unit and sigmoid activation. However if the occurrences of non zero values is completely random then it would be impossible to accurately predict.
For the Z axis you could frame it as a multiclass classification problem. Your labels would have a width 3 where the correct number is one hot encoded. So if the next Z axis value was 2 then your labels would be [1, 0, 0]. The final layer should be a dense layer with 3 units. The activation should be softmax because you want it to select 1 of the 3 options, as apposed to a sigmoid activation which could select any combination of the three.
You could predict these all in one network if you used Kerases functional model API to do multi-output.
来源:https://stackoverflow.com/questions/51145007/3-vector-series-lstm-cant-break-0-5-accuracy