How to compute error rate from a decision tree?

流过昼夜 提交于 2019-11-28 15:45:33

问题


Does anyone know how to calculate the error rate for a decision tree with R? I am using the rpart() function.


回答1:


Assuming you mean computing error rate on the sample used to fit the model, you can use printcp(). For example, using the on-line example,

> library(rpart)
> fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
> printcp(fit)

Classification tree:
rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)

Variables actually used in tree construction:
[1] Age   Start

Root node error: 17/81 = 0.20988

n= 81 

        CP nsplit rel error  xerror    xstd
1 0.176471      0   1.00000 1.00000 0.21559
2 0.019608      1   0.82353 0.82353 0.20018
3 0.010000      4   0.76471 0.82353 0.20018

The Root node error is used to compute two measures of predictive performance, when considering values displayed in the rel error and xerror column, and depending on the complexity parameter (first column):

  • 0.76471 x 0.20988 = 0.1604973 (16.0%) is the resubstitution error rate (i.e., error rate computed on the training sample) -- this is roughly

    class.pred <- table(predict(fit, type="class"), kyphosis$Kyphosis)
    1-sum(diag(class.pred))/sum(class.pred)
    
  • 0.82353 x 0.20988 = 0.1728425 (17.2%) is the cross-validated error rate (using 10-fold CV, see xval in rpart.control(); but see also xpred.rpart() and plotcp() which relies on this kind of measure). This measure is a more objective indicator of predictive accuracy.

Note that it is more or less in agreement with classification accuracy from tree:

> library(tree)
> summary(tree(Kyphosis ~ Age + Number + Start, data=kyphosis))

Classification tree:
tree(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)
Number of terminal nodes:  10 
Residual mean deviance:  0.5809 = 41.24 / 71 
Misclassification error rate: 0.1235 = 10 / 81 

where Misclassification error rate is computed from the training sample.



来源:https://stackoverflow.com/questions/9666212/how-to-compute-error-rate-from-a-decision-tree

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!