How to use random forests in R with missing values?

后端 未结 3 1814
终归单人心
终归单人心 2020-12-07 09:33
library(randomForest)
rf.model <- randomForest(WIN ~ ., data = learn)

I would like to fit a random forest model, but I get this error:



        
3条回答
  •  庸人自扰
    2020-12-07 10:24

    If there is possibility that missing values are informative then you can inpute missing values and add additional binary variables (with new.vars<-is.na(your_dataset) ) and check if it lowers error, if new.var is too large set to add it to your_dataset then you could use it alone, pick significiant variables with varImpPlot and add them to your_dataset, you could also try to add single variable to your_dataset which counts number of NA's new.var <- rowSums(new.vars)

    This is not off-topick answer, if missing variables are informative accounting for them could correct for increase of model error due to inperfect imputation procedure alone.

    Missing values are informative then they arise due to non-random causes, its expecially common in social experiments settings.

提交回复
热议问题