How to get J48 size and number of leaves

怎甘沉沦 提交于 2019-12-24 15:22:22

问题


If I build a J48 tree by:

library(RWeka)

fit <- J48(Species~., data=iris)

I get the following result:

> fit
J48 pruned tree
------------------

Petal.Width <= 0.6: setosa (50.0)
Petal.Width > 0.6
|   Petal.Width <= 1.7
|   |   Petal.Length <= 4.9: versicolor (48.0/1.0)
|   |   Petal.Length > 4.9
|   |   |   Petal.Width <= 1.5: virginica (3.0)
|   |   |   Petal.Width > 1.5: versicolor (3.0/1.0)
|   Petal.Width > 1.7: virginica (46.0/1.0)

Number of Leaves  :     5

Size of the tree :  9

I would like to get the Number of Leaves into a variable N (so N will get 5) and the Size of the tree to S (so S will get 9).

Is there a way to get this information directly from J48 tree?


回答1:


As previously pointed out by @LyzandeR it is not easy to do this on the J48 object directly. Generally, the objects returned by the fitting functions in RWeka usually contain relatively few informations on the R side (e.g., only the call and the fitted predictions). The main ingredient is typically a reference to Java object built by Weka to which Weka's own methods can be applied on the Java side via .jcall and then returned in R.

However, for the J48 trees it is easy to transform the information from the Java side into an R object for which standard functions and methods are available. The partykit package provides a coercion function that transforms J48 trees into constparty objects (recursive partitions with constant fits in the leaves). Then methods like length(), width(), or depth() can be used to query the number of nodes, leaves, and the depth of the tree, respectively.

library("RWeka")
fit <- J48(Species ~ ., data = iris)
library("partykit")
p <- as.party(fit)
length(p)
## [1] 9
width(p)
## [1] 5
depth(p)
## [1] 4

Furthermore, predict(), plot(), print() and many other tools are available for the party object.

I would recommend using this approach over the text parsing suggested by @LyzandeR because the as.party conversion does not rely on potentially error-prone text computations. Instead, it internally calls Weka's own graph generator (via .jcall) and then parses this into the constparty structure.




回答2:


Interestingly it looks like the output of fit is created within a .jcall function in print.Weka_classifier as it can be seen from getAnywhere(print.Weka_classifier). This makes it more difficult (but not impossible) to extract values from the print output.

In order to store these two values you could do:

library(RWeka)

fit <- J48(Species~., data=iris)

#store the print output in a
a <- capture.output(fit)

> a
 [1] "J48 pruned tree"                                     "------------------"                                 
 [3] ""                                                    "Petal.Width <= 0.6: setosa (50.0)"                  
 [5] "Petal.Width > 0.6"                                   "|   Petal.Width <= 1.7"                             
 [7] "|   |   Petal.Length <= 4.9: versicolor (48.0/1.0)"  "|   |   Petal.Length > 4.9"                         
 [9] "|   |   |   Petal.Width <= 1.5: virginica (3.0)"     "|   |   |   Petal.Width > 1.5: versicolor (3.0/1.0)"
[11] "|   Petal.Width > 1.7: virginica (46.0/1.0)"         ""                                                   
[13] "Number of Leaves  : \t5"                              ""                                                   
[15] "Size of the tree : \t9"      

# get the output length, so that this can work for a tree
# with any size/number of leaves
out_length = length(a)

# then save the number from the fourth to last element to N
N <- as.numeric(gsub('\\D', '', a[out_length - 3]))

#then save the number from second to last element to S
S <- as.numeric(gsub('\\D', '', a[out_length - 1]))

And there you have it:

> N
[1] 5
> S
[1] 9


来源:https://stackoverflow.com/questions/32693128/how-to-get-j48-size-and-number-of-leaves

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!