R rpart: No splits if I remove less important variables

混江龙づ霸主 提交于 2019-12-24 04:56:31

问题


I am trying to understand how rpart works in a project that I am trying to complete. I am relatively new to R but I have a lot of experience using SAS to build a variety of analytical models.

First I ran this piece of code

mtree1 <- rpart(X17~., data = mydata, method="class", control = rpart.control(minsplit = 20, minbucket = 7, maxdepth = 10, usesurrogate = 2, xval =10 ))

I get a tree with X12 as the top split, X10 is the next split on the LHS, X69 on the RHS, and then X68 and X70 on that branch.

Next I ran the following piece

mtree1 <- rpart(X17~ X12+X10+X69+X68+X70, data = mydata, method="class", control = rpart.control(minsplit = 20, minbucket = 7, maxdepth = 10, usesurrogate = 2, xval =10 ))

I get the exact same tree

Finally I ran this

mtree1 <- rpart(X17~ X12+X69+X68+X70, data = mydata, method="class", control = rpart.control(minsplit = 20, minbucket = 7, maxdepth = 10, usesurrogate = 2, xval =10 ))

Now I get no splits at all. (BTW, my data set has 234144 observations & 90 independent variables with 210205 goods & 23839 bads.)

Here is an image of the code and output

What is the reason for this? I would appreciate any help. Thanks. KK

来源:https://stackoverflow.com/questions/46551031/r-rpart-no-splits-if-i-remove-less-important-variables

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!