Caret and KNN in R: predict function gives error

▼魔方 西西 提交于 2019-12-03 23:01:48

问题


I try to predict with a simplified KNN model using the caret package in R. It always gives the same error, even in the very simple reproducible example here:

library(caret)
set.seed(1)

#generate training dataset "a" 
n = 10000
a = matrix(rnorm(n*8,sd=1000000),nrow = n)
y = round(runif(n))
a = cbind(y,a)
a = as.data.frame(a)
a[,1] = as.factor(a[,1])
colnames(a) = c("y",paste0("V",1:8))

#estimate simple KNN model
ctrl <- trainControl(method="none",repeats = 1)
knnFit <- train(y ~ ., data = a, method = "knn", trControl = ctrl, preProcess = c("center","scale"),  tuneGrid = data.frame(k = 10))

#predict on the training dataset (=useless, but should work)
knnPredict <- predict(knnFit,newdata = a,  type="prob")

This gives

Error in [.data.frame(out, , obsLevels, drop = FALSE) : undefined columns selected

Defining a more realistic test dataset "b" without the target variable y...

#generate test dataset
b =  matrix(rnorm(n*8,sd=1000000),nrow = n) 
b = as.data.frame(b)
colnames(b) = c(paste0("V",1:8))

#predict on the test datase
knnPredict <- predict(knnFit,newdata = b,  type="prob")

gives the same error

Error in [.data.frame(out, , obsLevels, drop = FALSE) : undefined columns selected

I know that the columnames are important, but here they are identical. What is wrong here? Thanks!


回答1:


The problem is your y variable. When you are asking for the class probabilities, the train and / or the predict function puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0" becomes "X0"). See also this post.

If you change this line in your code it should work:

a[,1] = factor(a[,1], labels = c("no", "yes"))


来源:https://stackoverflow.com/questions/33200033/caret-and-knn-in-r-predict-function-gives-error

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!