I am using Recurrent Neural Networks (RNN) for forecasting, but for some weird reason, it always outputs 1. Here I explain this with a toy example as:
Example
Consider a matrix M
of dimensions (360, 5), and a vector Y
which contains rowsum of M
. Now, using RNN, I want to predict Y
from M
. Using rnn
R
package, I trained model as
library(rnn)
M <- matrix(c(1:1800),ncol=5,byrow = TRUE) # Matrix (say features)
Y <- apply(M,1,sum) # Output equls to row sum of M
mt <- array(c(M),dim=c(NROW(M),1,NCOL(M))) # matrix formatting as [samples, timesteps, features]
yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y))) # formatting
model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=1000) # training
One strange thing I observed while training is that epoch error is always 4501. Ideally, epoch error should decrease with the increase in epochs.
Next, I created a test dataset with the same structure as above one as:
M2 <- matrix(c(1:15),nrow=3,byrow = TRUE)
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
predictr(model,mt2)
With prediction, I always get the output as 1. What can be the reason for the constant epoch error and the same output?
UPDATE # 1
Answer provided by @Barker does not work on my problem. To make it open, here I share minimalistic data via dropbox links as traindata, testadata, and my R
code as.
Data details: column 'power' is response variable which is a function of temperature, humidity, and power consumed on previous days from day1 to day 14.
normalize_data <- function(x){
normalized = (x-min(x))/(max(x)-min(x))
return(normalized)
}
#read test and train data
traindat <- read.csv(file = "train.csv")
testdat <- read.csv(file = "test.csv")
# column "power" is response variable and remaining are predictors
# predictors in traindata
trainX <- traindat[,1:dim(traindat)[2]-1]
# response of train data
trainY <- traindat$power
# arrange data acc. to RNN as [samples,time steps, features]
tx <- array(as.matrix(trainX), dim=c(NROW(trainX), 1, NCOL(trainX)))
tx <- normalize_data(tx) # normalize data in range of [0,1]
ty <- array(trainY, dim=c(NROW(trainY), 1, NCOL(trainY))) # arrange response acc. to predictors
# train model
model <- trainr(X = tx, Y = ty, learningrate = 0.08, hidden_dim = 6, numepochs = 400)
# predictors in test data
testX <- testdat[,1:dim(testdat)[2]-1]
testX <- normalize_data(testX) # normalize data in range of [0,1]
#testY <- testdat$power
# arrange data acc. to RNN as [samples,time steps, features]
tx2 <- array(as.matrix(testX), dim=c(NROW(testX), 1, NCOL(testX))) # predict
pred <- predictr(model,tx2)
pred
I varied parameters learning rate, hidden_dim, numepochs
, but still it either results in 0.9 or 1.
Most RNNs don't like data that don't have a constant mean. One strategy for dealing with this is differencing the data. To see how this works, lets work with a base R
time series co2
. This is a time series with a nice smooth seasonality and trend, so we should be able to forecast it.
For our model our input matrix is going to be the "seasonality" and "trend" of the co2
time series, created using the stl
decomposition. So lets make our training and testing data as you did before and train the model (note I reduced the numepochs
for runtime). I will use all the data up to the last year and a half for training, and then use the last year and a half for testing:
#Create the STL decomposition
sdcomp <- stl(co2, s.window = 7)$time.series[,1:2]
Y <- window(co2, end = c(1996, 6))
M <- window(sdcomp, end = c(1996, 6))
#Taken from OP's code
mt <- array(c(M),dim=c(NROW(M),1,NCOL(M)))
yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y)))
model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=100)
Now we can create our predictions on the last year of testing data:
M2 <- window(sdcomp, start = c(1996,7))
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
predictr(model,mt2)
output:
[,1]
[1,] 1
[2,] 1
[3,] 1
[4,] 1
[5,] 1
[6,] 1
[7,] 1
[8,] 1
[9,] 1
[10,] 1
[11,] 1
[12,] 1
[13,] 1
[14,] 1
[15,] 1
[16,] 1
[17,] 1
[18,] 1
Ewe, it is all ones again, just like in your example. Now lets try this again, but this time we will difference the data. Since we are trying to make our predictions one and a half years out, we will use 18 as our differencing lag as those are the values we would know 18 months ahead of time.
dco2 <- diff(co2, 18)
sdcomp <- stl(dco2, s.window = "periodic")$time.series[,1:2]
plot(dco2)
Great, the trend is now gone so our neural net should be able to find the pattern better. Lets try again with the new data.
Y <- window(dco2, end = c(1996, 6))
M <- window(sdcomp, end = c(1996, 6))
mt <- array(c(M),dim=c(NROW(M),1,NCOL(M)))
yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y)))
model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=100)
M2 <- window(sdcomp, start = c(1996,7))
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
(preds <- predictr(model,mt2))
output:
[,1]
[1,] 9.999408e-01
[2,] 9.478496e-01
[3,] 6.101828e-08
[4,] 2.615463e-08
[5,] 3.144719e-08
[6,] 1.668084e-06
[7,] 9.972314e-01
[8,] 9.999901e-01
[9,] 9.999916e-01
[10,] 9.999916e-01
[11,] 9.999916e-01
[12,] 9.999915e-01
[13,] 9.999646e-01
[14,] 1.299846e-02
[15,] 3.114577e-08
[16,] 2.432247e-08
[17,] 2.586075e-08
[18,] 1.101596e-07
Ok, now there is something there! Lets see how it compares to what were were trying to forecast, dco2
:
Not ideal, but we but it is finding the general "up down" pattern of the data. Now all you have to do is tinker with your learning rates and start optimizing with all those lovely hyper-parameters that make working with neural nets such a joy. When it is working how you want, you can just take your final output and add back in the last 18 months of your training data.
From my review of the examples with the package (see ?trainr
) the inputs into the training function have to be binary. There are the functions int2bin
and bin2int
in the package.
I have not been able to get them to work correctly, but it appears conversion to binary is needed.
来源:https://stackoverflow.com/questions/41879049/why-does-rnn-always-output-1