R Caret Package error imputing data with Pre-Process function

狂风中的少年 提交于 2019-12-25 05:31:46

问题


I have a dataset (training - testing) with missing data and I would like to impute data before the classification.

I tried using the caret package and the function preProcess, I want to impute data using the predictor variable for the training set and impute data on the testing set only using the knowledge of the trainingset without using the predictor of the testing set (that I should not know).

p = preProcess(x = training, method = "knnImpute", k = 10)
pred = predict(object = p, newdata = training)
pred1 = predict(object = p, newdata = testing)

when I run this code, I have this error on the second line

Error in FUN(newX[, i], ...) : 
  cannot impute when all predictors are missing in the new data point

I also tried to remove the predictor variable in the training set but the result is the same. I tried using the Iris dataset, removing some value in each column and removing the predictor and it works...but the datasets are with the same characteristics, both data.frame and both only with numeric values.


回答1:


From your words ("without using the predictor of the testing set (that I should not know)"), I conclude that by "predictor" you mean the target variable - which is by itself a mistake. "Predictors" are the known features, from which we wish to predict the target variable...

If I am correct, you are actually trying to predict the target variable using missing values imputation, which is again a mistake, and not the purpose of missing value imputation. The correct use is when you have some (but not all) values missing from your predictors (features), and you want to imputate them in order, say, to be used as input to some ML algorithm which does not tolerate missing values.




回答2:


I also faced the same error and have worked it out that the data set that you are imputing i.e. training, was created using createDataPartition by splitting into training and testing sets. Imputing works fine if you apply it to the original set before the split.



来源:https://stackoverflow.com/questions/29351656/r-caret-package-error-imputing-data-with-pre-process-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!