I have a training set that looks like
Name Day Area X Y Month Night
ATTACK Monday LA -122.41 37.78 8 0
VEHICLE Saturday CHICAGO -1.67 3.15 2 0
MOUSE Monday TAIPEI -12.5 3.1 9 1
Name
is the outcome/dependent variable. I converted Name
, Area
and Day
into factors, but I wasn't sure if I was supposed to for Month
and Night
, which only take on integer values 1-12 and 0-1, respectively.
I then convert the data into matrix
ynn <- model.matrix(~Name , data = trainDF)
mnn <- model.matrix(~ Day+Area +X + Y + Month + Night, data = trainDF)
I then setup tuning the parameters
nnTrControl=trainControl(method = "repeatedcv",number = 3,repeats=5,verboseIter = TRUE, returnData = FALSE, returnResamp = "all", classProbs = TRUE, summaryFunction = multiClassSummary,allowParallel = TRUE)
nnGrid = expand.grid(.size=c(1,4,7),.decay=c(0,0.001,0.1))
model <- train(y=ynn, x=mnn, method='nnet',linout=TRUE, trace = FALSE, trControl = nnTrControl,metric="logLoss", tuneGrid=nnGrid)
However, I get the error Error: nrow(x) == n is not TRUE
for the model<-train
I also get a similar error if I use xgboost
instead of nnet
Anyone know whats causing this?
y
should be a numeric or factor vector containing the outcome for each sample, not a matrix. Using
train(y = make.names(trainDF$Name), ...)
helps, where make.names
modifies values so that they could be valid variable names.
Even though in the help file of train
said either maxtrix or data frame would be expected, but you can try to convert the matrix to a data frame:
model <- train(y=ynn, x=as.data.frame(mnn), method='nnet',linout=TRUE, trace = FALSE, trControl = nnTrControl,metric="logLoss", tuneGrid=nnGrid)
来源:https://stackoverflow.com/questions/35527492/error-nrowx-n-is-not-true-when-using-train-in-caret