Why does RNN always output 1

此生再无相见时 提交于 2019-12-06 04:43:15

问题


I am using Recurrent Neural Networks (RNN) for forecasting, but for some weird reason, it always outputs 1. Here I explain this with a toy example as:

Example Consider a matrix M of dimensions (360, 5), and a vector Y which contains rowsum of M. Now, using RNN, I want to predict Y from M. Using rnn R package, I trained model as

   library(rnn) 
    M <- matrix(c(1:1800),ncol=5,byrow = TRUE) # Matrix (say features) 
    Y <- apply(M,1,sum) # Output equls to row sum of M
    mt <- array(c(M),dim=c(NROW(M),1,NCOL(M))) # matrix formatting as [samples, timesteps, features]
    yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y))) # formatting
    model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=1000) # training

One strange thing I observed while training is that epoch error is always 4501. Ideally, epoch error should decrease with the increase in epochs.

Next, I created a test dataset with the same structure as above one as:

M2 <- matrix(c(1:15),nrow=3,byrow = TRUE)
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
predictr(model,mt2)

With prediction, I always get the output as 1. What can be the reason for the constant epoch error and the same output?

UPDATE # 1

Answer provided by @Barker does not work on my problem. To make it open, here I share minimalistic data via dropbox links as traindata, testadata, and my R code as.

Data details: column 'power' is response variable which is a function of temperature, humidity, and power consumed on previous days from day1 to day 14.

normalize_data <- function(x){
  normalized = (x-min(x))/(max(x)-min(x))
  return(normalized)
}

#read test and train data
traindat <- read.csv(file = "train.csv")
testdat <- read.csv(file = "test.csv")
# column "power" is response variable and remaining are predictors
# predictors in  traindata
trainX <- traindat[,1:dim(traindat)[2]-1]
# response of train data
trainY <- traindat$power
# arrange data acc. to RNN as [samples,time steps, features]
tx <- array(as.matrix(trainX), dim=c(NROW(trainX), 1, NCOL(trainX)))
tx <- normalize_data(tx) # normalize data in range of [0,1]
ty <- array(trainY, dim=c(NROW(trainY), 1, NCOL(trainY))) # arrange response acc. to predictors
# train model
model <- trainr(X = tx, Y = ty, learningrate = 0.08, hidden_dim = 6, numepochs = 400)

# predictors in test data
testX <- testdat[,1:dim(testdat)[2]-1]
testX <- normalize_data(testX) # normalize data in range of [0,1]
#testY <- testdat$power
# arrange data acc. to RNN as [samples,time steps, features]
tx2 <- array(as.matrix(testX), dim=c(NROW(testX), 1, NCOL(testX))) # predict
pred <- predictr(model,tx2)
pred

I varied parameters learning rate, hidden_dim, numepochs, but still it either results in 0.9 or 1.


回答1:


Most RNNs don't like data that don't have a constant mean. One strategy for dealing with this is differencing the data. To see how this works, lets work with a base R time series co2. This is a time series with a nice smooth seasonality and trend, so we should be able to forecast it.

For our model our input matrix is going to be the "seasonality" and "trend" of the co2 time series, created using the stl decomposition. So lets make our training and testing data as you did before and train the model (note I reduced the numepochs for runtime). I will use all the data up to the last year and a half for training, and then use the last year and a half for testing:

#Create the STL decomposition
sdcomp <- stl(co2, s.window = 7)$time.series[,1:2]

Y <- window(co2, end = c(1996, 6))
M <- window(sdcomp, end = c(1996, 6))
#Taken from OP's code
mt <- array(c(M),dim=c(NROW(M),1,NCOL(M)))
yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y))) 
model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=100)

Now we can create our predictions on the last year of testing data:

M2 <- window(sdcomp, start = c(1996,7))
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
predictr(model,mt2)

output:
      [,1]
 [1,]    1
 [2,]    1
 [3,]    1
 [4,]    1
 [5,]    1
 [6,]    1
 [7,]    1
 [8,]    1
 [9,]    1
[10,]    1
[11,]    1
[12,]    1
[13,]    1
[14,]    1
[15,]    1
[16,]    1
[17,]    1
[18,]    1

Ewe, it is all ones again, just like in your example. Now lets try this again, but this time we will difference the data. Since we are trying to make our predictions one and a half years out, we will use 18 as our differencing lag as those are the values we would know 18 months ahead of time.

dco2 <- diff(co2, 18)
sdcomp <- stl(dco2, s.window = "periodic")$time.series[,1:2]
plot(dco2)

Great, the trend is now gone so our neural net should be able to find the pattern better. Lets try again with the new data.

Y <- window(dco2, end = c(1996, 6))
M <- window(sdcomp, end = c(1996, 6))

mt <- array(c(M),dim=c(NROW(M),1,NCOL(M)))
yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y)))
model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=100)

M2 <- window(sdcomp, start = c(1996,7))
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
(preds <- predictr(model,mt2))

output:
              [,1]
 [1,] 9.999408e-01
 [2,] 9.478496e-01
 [3,] 6.101828e-08
 [4,] 2.615463e-08
 [5,] 3.144719e-08
 [6,] 1.668084e-06
 [7,] 9.972314e-01
 [8,] 9.999901e-01
 [9,] 9.999916e-01
[10,] 9.999916e-01
[11,] 9.999916e-01
[12,] 9.999915e-01
[13,] 9.999646e-01
[14,] 1.299846e-02
[15,] 3.114577e-08
[16,] 2.432247e-08
[17,] 2.586075e-08
[18,] 1.101596e-07

Ok, now there is something there! Lets see how it compares to what were were trying to forecast, dco2:

Not ideal, but we but it is finding the general "up down" pattern of the data. Now all you have to do is tinker with your learning rates and start optimizing with all those lovely hyper-parameters that make working with neural nets such a joy. When it is working how you want, you can just take your final output and add back in the last 18 months of your training data.




回答2:


From my review of the examples with the package (see ?trainr) the inputs into the training function have to be binary. There are the functions int2bin and bin2int in the package.

I have not been able to get them to work correctly, but it appears conversion to binary is needed.



来源:https://stackoverflow.com/questions/41879049/why-does-rnn-always-output-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!