Keras LSTM: a time-series multi-step multi-features forecasting - poor results

浪尽此生 提交于 2019-12-03 06:35:23

It looks like there is a confusion on how to organise the data to train a RNN. So let's cover the questions:

  1. Once you have a 2D dataset (total_samples, 5) you can use the TimeseriesGenerator to create a sliding window what will generate (batch_size, past_timesteps, 5) for you. In this case, you will use .fit_generator to train the network.
  2. If you get the same result, 50 epochs should be fine. You usually adjust based on the performance of your network. But you should keep it fixed if you are comparing two different network architectures.
  3. Architecture is really large as you aim to predict all 672 future values at once. You can design the network so it learns to predict one measurement at a time. At prediction time you can predict one point and feed that again to predict the next until you get 672.
  4. This ties into answer 3, you can learn to predict one at a time and chain the predictions to n number of predictions after training.

The single point prediction model could look like:

model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(past_timesteps, 5))
model.add(LSTM(64))
model.add(Dense(1))

1) Batches are not the sequences. The input X is the sequence. The input should have the shape [None, sequence_length, number_of_features]. The 1st axis will be filled in by Keras with the batches. But they are not the sequences. The sequences are on the 2nd axis. The 3rd axis are the feature columns. Batch size 672 might be too large. You can try smaller values 128, 64, or 32.

2) Almost certain your network overfits. The network has too many LSTM layers. I would try just 2 layers of LSTM as @nuric suggested and see how it performs.

3) There also seems a confusion about the LSTM units (or LSTM size). It does not have to be 672. In fact, 672 is too large. A good starting point is 128.

4) The NN architecture is predicting a single value of VAR. In that case, make sure your Y have a single value for each sequence of X.

5) Alternatively you can make the last LSTM to output a sequence. In that case, each Y entry is a VAR sequence shifted one-step ahead. Going back to 4), make sure Y has the correct shape corresponding with that of X and the NN architecture.

6) You plot shows 50 epochs are enough for converging. Once you adjust X, Y, and the NN, do the same thing for watching the number of epochs.

7) Lastly an idea about the dates. If you want to include the dates in X, one idea is to one-hot encode them into week days. So your X would be [dewpt, hum, press, temp, MON, TUE, ..., SAT, SUN].

OParryEvans

Your main issue here, as stated by others, is the size of your network. LSTMs are great for learning long term dependencies but they're certainly not magic. Personally I haven't had much success with any sequences with 100+ timesteps. What you will find is you end up suffering from the 'exploding/vanishing gradients problem' because your network is too large.

I won't reiterate what others have said about reshaping your data into the proper format but once you have done that I recommend starting off small (10/15 steps) and predicting just the outcome of the next step, and build it up from there. That's not to say that you can't eventually predict a much longer sequence and farther into the future but starting small will help you understand how the RNN is behaving before building it up.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!