Caret::train - Values Not Imputed

梦想与她 提交于 2019-12-04 17:44:34

问题


I am trying to impute values by passing "knnImpute" to the preProcess argument of Caret's train() method. Based on the following example, it appears that the values are not imputed, remain as NA and are then ignored. What am I doing wrong?

Any help is much appreciated.

library("caret")

set.seed(1234)
data(iris)

# mark 8 of the cells as NA, so they can be imputed
row <- sample (1:nrow (iris), 8)
iris [row, 1] <- NA

# split test vs training
train.index <- createDataPartition (y = iris[,5], p = 0.80, list = F)
train <- iris [ train.index, ]
test  <- iris [-train.index, ]

# train the model after imputing the missing data
fit <- train (Species ~ ., 
              train, 
              preProcess = c("knnImpute"), 
              na.action  = na.pass, 
              method     = "rpart" )
test$species.hat <- predict (fit, test)

# there is 1 obs. (of 30) in the test set equal to NA  
# this 1 obs. was not returned from predict
Error in `$<-.data.frame`(`*tmp*`, "species.hat", value = c(1L, 1L, 1L,  : 
  replacement has 29 rows, data has 30

UPDATE: I have been able to use the preProcess function directly to impute the values. I still don't understand why this does not seem to occur within the train function.

# attempt to impute using nearest neighbors
x <- iris [, 1:4]
pp <- preProcess (x, method = c("knnImpute"))
x.imputed <- predict (pp, newdata = x)

# expect all NAs were populated with an imputed value
stopifnot( all (!is.na (x.imputed)))
stopifnot( length (x) == length (x.imputed))

回答1:


See ?predict.train:

 ## S3 method for class 'train'
 predict(object, newdata = NULL, type = "raw", na.action = na.omit, ...)

There is an na.omit here too:

 > length(predict (fit, test))
 [1] 29
 > length(predict (fit, test, na.action = na.pass))
 [1] 30

Max



来源:https://stackoverflow.com/questions/20054906/carettrain-values-not-imputed

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!