How to implement the output of decision tree built using the ctree (party package)?

一世执手 提交于 2019-12-13 14:08:27

问题


I have built a decision tree using the ctree function via party package. it has 1700 nodes. Firstly, is there a way in ctree to give the maxdepth argument? I tried control_ctree option but, it threw some error message saying couldnt find ctree function.

Also, how can I consume the output of this tree?. How can it be implemented for other platforms like SAS or SQL. I also have another doubt as to what does the value "* weights = 4349 " at the end of the node signify. How will I know, that which terminal node votes for which predicted value.


回答1:


There is a maxdepth option in ctree. It is located in ctree_control()

You can use it as follows

airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))

You can also restrict the split sizes and the bucket sizes to be "no less than"

airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(minsplit= 50, minbucket = 20))

You can also to reduce increase sensetivity and lower the P-value

airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(mincriterion = 0.99))

The weights = 4349 you've mentioned is just the number of observations in that specific node. ctree has a default of giving a weight of 1 to every observation, but if you feel that you have observations that deserve bigger weights you can add a weights vector to the ctree() which have to be the same length as the data set and have to be non-negative integers. After you do that, the weights = 4349 will have to be interpreted with caution.

One way of using weights is to see which observations fell in a certain node. Using the data in the example above we can perform the following

airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))
unique(where(airct)) #in order the get the terminal nodes
[1] 5 3 6 9 8

so we can check what fell in node number 5 for example

n <- nodes(airct , 5)[[1]]
x <- airq[which(as.logical(n$weights)), ]  
x
    Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
2      36     118  8.0   72     5   2
3      12     149 12.6   74     5   3
4      18     313 11.5   62     5   4
...

Using this method you can create data sets that will contain the informationn of you terminal nodes and then import them into SAS or SQL

You can also get the list of splitting conditions using the function from my answer below ctree() - How to get the list of splitting conditions for each terminal node?



来源:https://stackoverflow.com/questions/18399510/how-to-implement-the-output-of-decision-tree-built-using-the-ctree-party-packag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!