问题
I have a dataset (training - testing) with missing data and I would like to impute data before the classification.
I tried using the caret package and the function preProcess, I want to impute data using the predictor variable for the training set and impute data on the testing set only using the knowledge of the trainingset without using the predictor of the testing set (that I should not know).
p = preProcess(x = training, method = "knnImpute", k = 10)
pred = predict(object = p, newdata = training)
pred1 = predict(object = p, newdata = testing)
when I run this code, I have this error on the second line
Error in FUN(newX[, i], ...) :
cannot impute when all predictors are missing in the new data point
I also tried to remove the predictor variable in the training set but the result is the same. I tried using the Iris dataset, removing some value in each column and removing the predictor and it works...but the datasets are with the same characteristics, both data.frame and both only with numeric values.
回答1:
From your words ("without using the predictor of the testing set (that I should not know)"), I conclude that by "predictor" you mean the target variable - which is by itself a mistake. "Predictors" are the known features, from which we wish to predict the target variable...
If I am correct, you are actually trying to predict the target variable using missing values imputation, which is again a mistake, and not the purpose of missing value imputation. The correct use is when you have some (but not all) values missing from your predictors (features), and you want to imputate them in order, say, to be used as input to some ML algorithm which does not tolerate missing values.
回答2:
I also faced the same error and have worked it out that the data set that you are imputing i.e. training, was created using createDataPartition by splitting into training and testing sets. Imputing works fine if you apply it to the original set before the split.
来源:https://stackoverflow.com/questions/29351656/r-caret-package-error-imputing-data-with-pre-process-function