Classification tree in R limit to 32 levels

假装没事ソ 提交于 2019-12-23 03:59:12

问题


I am trying to create a classification tree in R using the package tree.

This is an excerpt of the dataset I am using (header included):

CENTRO_EXAMEN,NOMBRE_AUTOESCUELA,MES,TIPO_EXAMEN,NOMBRE_PERMISO,PROB
Alcal· de Henares,17APTOV,5,PRUEBA DESTREZA,A2 ,0
Alcal· de Henares,17APTOV,5,PRUEBA CONDUCCION Y CIRCULACION,B  ,0.8
Alcal· de Henares,17APTOV,5,PRUEBA TEORICA,B  ,0.333333333
Alcal· de Henares,2000,5,PRUEBA TEORICA,B  ,0

and this is the commands I am issuing to R:

madrid=read.csv("madrid.csv",header=T,na.strings="?")
#madrid=na.omit(madrid)
names(madrid)
dim(madrid)
fix(madrid)
library(tree)
attach(madrid)

#costruisce albero
High=ifelse(PROB<=0.5,"No","Yes")
madrid=data.frame(madrid,High)
tree.madrid=tree(High~CENTRO_EXAMEN+NOMBRE_AUTOESCUELA+MES+TIPO_EXAMEN+NOMBRE_PERMISO,madrid)
summary(tree.madrid)
plot(tree.madrid)
text(tree.madrid,pretty=0)
tree.madrid

R returns the following error after issuing tree.madrid

Error in tree(High ~ CENTRO_EXAMEN + NOMBRE_AUTOESCUELA + MES + TIPO_EXAMEN +  : 
  factor predictors must have at most 32 levels

Any idea why?


回答1:


Basically, it becomes computationally expensive to create so many splits in your data, since you are selecting the best split out of all 2^32 (approx) possible splits.

If you are able to use a random forest, Ben's comment here suggests that the randomForest can now handle up to 53 levels. If you cannot use a random forest for whatever reason, you can consider collapsing the levels of your categorical predictor.



来源:https://stackoverflow.com/questions/37678420/classification-tree-in-r-limit-to-32-levels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!