R SVM return NA for predictions with missing data

北城余情 提交于 2019-12-10 17:07:02

问题


I am attempting to make predictions using a trained SVM from package e1071 but my data contains some missing values (NA).

I would like the returned predictions to be NA when that instance has any missing values. I tried to use na.action = na.pass as below but it gives me an error "Error in names(ret2) <- rowns : 'names' attribute [150] must be the same length as the vector [149]".

If I use na.omit then I can get predictions without instances with missing data. How can I get predictions including NAs?

library(e1071)
model <- svm(Species ~ ., data = iris)
print(length(predict(model, iris)))
tmp <- iris
tmp[1, "Sepal.Length"] <- NA
print(length(predict(model, tmp, na.action = na.pass)))

回答1:


if you are familiar with the caret package, where you can use 233 different types of models to fit (Including SVM from package e1071), in the section called "models clustered by tag similarity" there you can find a csv with the data they used to group the algorithms.

There is a column there called Handle Missing Predictor Data, which tells you which algorithms can do what you want. Unfortunately SVM is not included there, but these algorithms are:

  • Boosted Classification Trees (ada)
  • Bagged AdaBoost (AdaBag)
  • AdaBoost.M1 (AdaBoost.M1)
  • C5.0 (C5.0)
  • Cost-Sensitive C5.0 (C5.0Cost)
  • Single C5.0 Ruleset (C5.0Rules)
  • Single C5.0 Tree (C5.0Tree)
  • CART (rpart)
  • CART (rpart1SE)
  • CART (rpart2)
  • Cost-Sensitive CART (rpartCost)
  • CART or Ordinal Responses (rpartScore)

If you still insist on using SVM, you could use the knnImpute option in the preProccess function from the same package, that should allow you to predict for all your observations.




回答2:


You could just assign all the valid cases back to a prediction variable in the tmp set:

tmp[complete.cases(tmp), "predict"] <- predict(model, newdata=tmp[complete.cases(tmp),]) 
tmp

#    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species    predict
#1             NA         3.5          1.4         0.2     setosa       <NA>
#2            4.9         3.0          1.4         0.2     setosa     setosa
#3            4.7         3.2          1.3         0.2     setosa     setosa
# ...


来源:https://stackoverflow.com/questions/42334759/r-svm-return-na-for-predictions-with-missing-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!