LSTM produces identical forecast for each input

问题

I've been working on reproducing a CNN-LSTM model for PV power forecasting from literature for the past four weeks for my Master Thesis in Energy Science (http://www.mdpi.com/2076-3417/8/8/1286). However I've been stuck on a seemingly simple issue: Any configuration of LSTM model that I've tried yields one of two things:

Rediculous output, makes no sense whatsoever (flat line, complete stochasticity, negative values, you name it)
Exactly the same (very believable) PV power forecast.

I've done my best to reproduce the issue with as little code as possible:

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential
from tensorflow.python.keras.layers import CuDNNLSTM
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from time import time

SUN_UP, SUN_DOWN = '03:00:00', '23:00:00'

df = pd.read_csv('../Model_Xander/CNN-LSTM-wang/pv_data/all_data_resample-15T_interpolate-4.csv',
           index_col = 0,
           parse_dates = True)
df = pd.DataFrame(df['151'])
df = df.between_time(SUN_UP, SUN_DOWN)

TIME_STEPS_PER_DAY = len(df.loc['1-1-2016'])
print('each day consists of ' + str(TIME_STEPS_PER_DAY) + ' time steps of 15 minutes')

df = df.values
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)

df = np.nan_to_num(df_scaled, nan = -1)
#df = np.float16(df)

def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size, step, single_step=False):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index, step):
    indices = range(i-history_size, i)
    data.append(dataset[indices])

    if single_step:
      labels.append(target[i+target_size])
    else:
      labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)
TRAIN_TEST_SPLIT = round(((2/3)*len(df)))
TARGET_COL = df[:,0]
HISTORY_SIZE = TIME_STEPS_PER_DAY * 10
TARGET_SIZE = TIME_STEPS_PER_DAY
STEP = TIME_STEPS_PER_DAY


x_train, y_train = multivariate_data(df, TARGET_COL, 0, TRAIN_TEST_SPLIT, HISTORY_SIZE, TARGET_SIZE, STEP)
x_test, y_test = multivariate_data(df, TARGET_COL, TRAIN_TEST_SPLIT, None, HISTORY_SIZE, TARGET_SIZE, STEP)

lstm = Sequential()
lstm.add(Input(shape = (x_train.shape[1], x_train.shape[2])))
lstm.add(Masking(mask_value = -1))
lstm.add(LSTM(units = 100,
                  kernel_initializer = keras.initializers.Orthogonal(),
                  bias_initializer = keras.initializers.Constant(value=0.1),
                  return_sequences = True))
lstm.add(LSTM(units = 100,
                  kernel_initializer = keras.initializers.Orthogonal(),
                  bias_initializer = keras.initializers.Constant(value=0.1),
                  return_sequences = False))
lstm.add(Dense(units = 100,  activation = 'relu',
              kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
              bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.add(Dense(units = y_test.shape[1], activation = 'relu',
              kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
              bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.compile(loss = 'mse', optimizer = 'adam')
lstm.summary()

begin = time()
history = lstm.fit(x_train, y_train,
                  epochs = 5,
                  batch_size = 24,
                  validation_data = (x_test, y_test),
                  verbose = 1,
                  shuffle = False)

end = time()
print('it took ' + str(round(end-begin)) + ' seconds to train 5 epochs')

print(history.history)
predict = lstm.predict(x_test)
print(predict.shape)

plt.figure()
for i in range(10, 20):
    plt.plot(predict[i,:])

plt.figure()
for i in range(0, x_test.shape[0]):
    plt.plot(predict[i,:])

The problem is clearly seen in the last plot: Plot of 350 predictions overlayed on top of one another

As you can see, all forecasts are identical, I have run out of ideas on how to combat this issue.

As far as i could deduce, there are a number of possible causes, first, my dataset contains a large number of NaN's, I've done my best to combat that issue with three methods:

Resampling from very high resolution (10 seconds) to standard resolution (15 min)
Interpolating up to 4 consecutive NaN's with linear interpolation (any more seems stupid to me)
The masking layer an observant reader might've noticed in the model definition in the code Even after these steps, my dataset still contains a large amount of NaN's, I'm not really sure what to do about it, or if the Masking layer is even doing its intended job. I do know for sure that the masking layer cannot play nicely with CuDNNLSTM, and my normal LSTM model runs a LOT slower with the masking layer.

The best I've been able to accomplish in terms of obtaining differently shaped predictions for differently shaped inputs is this: Differently shaped output for differently shaped inputs However, as you can see, this is just the same shape with a slightly different amplitude.

Another thing I've noticed is that when i input data from 9 other sensors as features (each with a similar amount and location of NaN's), the amplitude changes per prediction (yay), but the shape remains the same across all predictions: yay different amplitude! Aww, same shape :(.

I will be uploading my model to my university's cluster (for the 200th time) to train for more than 5 epochs, who knows, maybe today is my lucky day. If anyone knows how to combat these issues, i would be very glad and thankful to hear your thoughts.

EDIT: In light of the lessons learned from the response below i made the following changes: Regularization and dropout to combat overfitting (which will lead to the average being forecasted for every input if left unchecked). Last LSTM layer with return_sequences = True Added Flatten layer after last LSTM layer Removed NaN values from my dataset removing the need for the masking layer and enabling the use of the CuDNNLSTM layer (train on GPU if I understand it correctly).

However, now that each day has a unique forecast, I noticed that increasing the number of units in the LSTM layer beyond somewhere between 20 and 50 (I tested 20 and 50). Will return the problem of each day having the exact same forecast. I am still stumped as to why this is. (See below for the model I used to produce unique forecasts for each day)

lstm = Sequential()
lstm.add(Input(shape = (x_train.shape[1], x_train.shape[2])))

lstm.add(CuDNNLSTM(units = 50,
                   kernel_initializer = keras.initializers.Orthogonal(),
                   kernel_regularizer = keras.regularizers.l1_l2(l1=1e-5, l2=1e-4),
                   
                   bias_initializer = keras.initializers.Constant(value=0.1),
                                      
                   return_sequences = True))

#lstm.add(Dropout(rate=0.2))

lstm.add(CuDNNLSTM(units = 50,
                   kernel_initializer = keras.initializers.Orthogonal(),
                   kernel_regularizer = keras.regularizers.l1_l2(l1=1e-5, l2=1e-4),
                   
                   bias_initializer = keras.initializers.Constant(value=0.1),
                   
                   return_sequences = True))

lstm.add(Dropout(rate = 0.2))

lstm.add(Flatten())

lstm.add(Dense(units = int(0.5*x_train.shape[1]),  activation = 'relu',
               kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
               
               bias_initializer = keras.initializers.Constant(value=0.1)))

lstm.add(Dropout(rate = 0.2))

lstm.add(Dense(units = y_test.shape[1], activation = 'relu',
               kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
               
               bias_initializer = keras.initializers.Constant(value=0.1)))

lstm.compile(loss = 'mse', optimizer = 'adam')
lstm.summary()

来源：https://stackoverflow.com/questions/62557990/lstm-produces-identical-forecast-for-each-input

标签

python

tensorflow

keras

lstm