How to prune a tree in R?

给你一囗甜甜゛ 提交于 2020-01-20 03:17:28

问题


I'm doing a classification using rpart in R. The tree model is trained by:

> tree <- rpart(activity ~ . , data=trainData)
> pData1 <- predict(tree, testData, type="class")

The accuracy for this tree model is:

> sum(testData$activity==pData1)/length(pData1)
[1] 0.8094276

I read a tutorial to prune the tree by cross validation:

> ptree <- prune(tree,cp=tree$cptable[which.min(tree$cptable[,"xerror"]),"CP"])
> pData2 <- predict(ptree, testData, type="class")

The accuracy rate for the pruned tree is still the same:

> sum(testData$activity==pData2)/length(pData2)
[1] 0.8094276

I want to know what's wrong with my pruned tree? And how can I prune the tree model using cross validation in R? Thanks.


回答1:


You have used the minimum cross-validated error tree. An alternative is to use the smallest tree that is within 1 standard error of the best tree (the one you are selecting). The reason for this is that, given the CV estimates of the error, the smallest tree within 1 standard error is doing just as good a job at prediction as the best (lowest CV error) tree, yet it is doing it with fewer "terms".

Plot the cost-complexity vs tree size for the un-pruned tree via:

plotcp(tree)

Find the tree to the left of the one with minimum error whose cp value lies within the error bar of one with minimum error.

There could be many reasons why pruning is not affecting the fitted tree. For example the best tree could be the one where the algorithm stopped according to the stopping rules as specified in ?rpart.control.



来源:https://stackoverflow.com/questions/15318409/how-to-prune-a-tree-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!