R randomForest subsetting can't get rid of factor levels [duplicate]

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-03 08:54:34

You cannot run the randomForest predict function on newdata that has missing factors as compared to the rf model. Since the factor levels of test$storeId range "2"-"11" and the train$storeId "1"-"10", when you drop level 11 in the test data your are still missing level "1" and thus randomForest predict is failing.

This is in fact a duplicate. You should be using droplevels and then after fixing that problem you're ignoring the fact that the levels still don't line up. You simply have to alter the levels so that they are the same as in the training data:

test1 <- droplevels(subset(test,storeId != 11))
levels(test1$storeId) <- as.character(c(2:10,1)
pred <- predict(RF1, test1)
> pred
       1        2        3        4        5        6        7        8        9 
698.9186 703.9761 654.5370 561.3058 491.1836 736.4316 639.8752 586.1755 782.1186 

The moral here is simply that your training data had a factor with levels 1,2,...10, your test data has to have the exact same set of levels (whether or not you have any data for some of those levels).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!