Properties and their values out of J48 tree (RWeka)

断了今生、忘了曾经 提交于 2019-12-11 02:58:31

问题


If you run the following:

library(RWeka) 
data(iris) 
res = J48(Species ~., data = iris)

res will be a list of class J48 inheriting from Weka_tree. If you print it

R> res
J48 pruned tree
------------------

Petal.Width <= 0.6: setosa (50.0)
Petal.Width > 0.6
|   Petal.Width <= 1.7
|   |   Petal.Length <= 4.9: versicolor (48.0/1.0)
|   |   Petal.Length > 4.9
|   |   |   Petal.Width <= 1.5: virginica (3.0)
|   |   |   Petal.Width > 1.5: versicolor (3.0/1.0)
|   Petal.Width > 1.7: virginica (46.0/1.0)

Number of Leaves  :     5

Size of the tree :  9

I would like to get the properties and their values by their order from right to left. So for this case:

Petal.Width, Petal.Width, Petal.Length, Petal.Length.

I tried to enter res to a factor and to run the command:

str_extract(paste0(x, collapse=""), perl("(?<=\\|)[A-Za-z]+(?=\\|)"))

with no success. Just to remember that we should ignore the left around characters.


回答1:


One way to do this is to convert the J48 object from RWeka to a party object from partykit. You just need to as as.party(res) and this does all the parsing for you and returns a structure that is easier to work with with standardized extractor functions etc.

In particular you can then use all advice given in other discussions about ctree objects etc. See

  • How to extract the splitting rules for the terminal nodes of ctree()

  • Get decision tree rule/path pattern for every row of predicted dataset for rpart/ctree package in R

  • Identify all distinct variables within party ctree nodel

And I think the following should do at least part of what you want:

library("partykit")
pres <- as.party(res)
partykit:::.list.rules.party(pres)
##                                                                                  2 
##                                                               "Petal.Width <= 0.6" 
##                                                                                  5 
##                     "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length <= 4.9" 
##                                                                                  7 
## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width <= 1.5" 
##                                                                                  8 
##  "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width > 1.5" 
##                                                                                  9 
##                                            "Petal.Width > 0.6 & Petal.Width > 1.7" 

Update: The OP contacted me off-list for a related question, asking for a specific printed representation of the tree. I'm including my solution here in case it is useful for someone else.

He wanted to have ( ) symbols signalling the hierarchy levels plus the names of the splitting variables. One way to do so would be to (1) extract variable names of the underlying data:

nam <- names(pres$data)

(2) Turn the recursive node structure of the tree into a flat list (which is somewhat more convenient for constructing the desired string):

tr <- as.list(pres$node)

(3a) Initialize the string:

str <- "("

(3b) Recursively add brackets and/or variable names to the string:

update_str <- function(x) {
   if(is.null(x$kids)) {
     str <<- paste(str, ")")
   } else {
     str <<- paste(str, nam[x$split$varid], "(")
     for(i in x$kids) update_str(tr[[i]])
   }
}

(3c) Call the recursion, starting from the root node:

update_str(tr[[1]])
str
## [1] "( Petal.Width ( ) Petal.Width ( Petal.Length ( ) Petal.Width ( ) ) )"



回答2:


I hope I'm not missing your point here, but I assume you want to create and store, somehow, the rules based on the terminal nodes of your tree model. Personally, I've found that the model tree building packages (RWeka, party, partykit, rpart) lack of enabling the user to create a useful list of rules after the model is built. Of course, when you have few variables and splits you can interpret the tree plot.

The only easy and robust way I've found so far (and I use myself) is the command "path.rpart" of the rpart package. If you really want to use RWeka the solution will seem irrelevant, but I'll give it a try:

library(rpart)

res = rpart(Species ~., data = iris)

res

# n= 150 
# 
# node), split, n, loss, yval, (yprob)
# * denotes terminal node
# 
# 1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)  
# 2) Petal.Length< 2.45 50   0 setosa (1.00000000 0.00000000 0.00000000) *
#   3) Petal.Length>=2.45 100  50 versicolor (0.00000000 0.50000000 0.50000000)  
# 6) Petal.Width< 1.75 54   5 versicolor (0.00000000 0.90740741 0.09259259) *
#   7) Petal.Width>=1.75 46   1 virginica (0.00000000 0.02173913 0.97826087) *


# capture terminal nodes
terminal_nodes = rownames(res$frame)[res$frame$var =="<leaf>"]

# print rules for the terminal nodes
path.rpart(res ,nodes=terminal_nodes)

# node number: 2 
# root
# Petal.Length< 2.45
# 
# node number: 6 
# root
# Petal.Length>=2.45
# Petal.Width< 1.75
# 
# node number: 7 
# root
# Petal.Length>=2.45
# Petal.Width>=1.75


# print above rules as list
rules = path.rpart(res ,nodes=terminal_nodes)
listed_rules = unlist(rules)
sapply(rules,"[",-1)

# $`2`
# [1] "Petal.Length< 2.45"
# 
# $`6`
# [1] "Petal.Length>=2.45" "Petal.Width< 1.75" 
# 
# $`7`
# [1] "Petal.Length>=2.45" "Petal.Width>=1.75" 


来源:https://stackoverflow.com/questions/32168408/properties-and-their-values-out-of-j48-tree-rweka

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!