chaid regression tree to table conversion in r

大兔子大兔子 提交于 2019-11-28 07:04:09

问题


I used the CHAID package from this link ..It gives me a chaid object which can be plotted..I want a decision table with each decision rule in a column instead of a decision tree. .But i dont understand how to access nodes and paths in this chaid object..Kindly help me.. I followed the procedure given in this link

i cant post my data here since it is too long.So i am posting a code which takes the sample dataset provided with chaid to perform the task.

copied from help manual of chaid:

library("CHAID")

  ### fit tree to subsample
  set.seed(290875)
  USvoteS <- USvote[sample(1:nrow(USvote), 1000),]

  ctrl <- chaid_control(minsplit = 200, minprob = 0.1)
  chaidUS <- chaid(vote3 ~ ., data = USvoteS, control = ctrl)

  print(chaidUS)
  plot(chaidUS)

Output:

Model formula:
vote3 ~ gender + ager + empstat + educr + marstat

Fitted party:
[1] root
|   [2] marstat in married
|   |   [3] educr <HS, HS, >HS: Gore (n = 311, err = 49.5%)
|   |   [4] educr in College, Post Coll: Bush (n = 249, err = 35.3%)
|   [5] marstat in widowed, divorced, never married
|   |   [6] gender in male: Gore (n = 159, err = 47.8%)
|   |   [7] gender in female
|   |   |   [8] ager in 18-24, 25-34, 35-44, 45-54: Gore (n = 127, err = 22.0%)
|   |   |   [9] ager in 55-64, 65+: Gore (n = 115, err = 40.9%)

Number of inner nodes:    4
Number of terminal nodes: 5

So my question is how to get this tree data in a decision table with each decision rule(branch/path) in a column..I dont understand how to access different tree paths from this chaid object..


回答1:


CHAID package uses partykit (recursive partitioning) tree structures. You can walk the tree by using party nodes - a node can be terminal or have a list of nodes with information about decision rule (split) and fitted data.

The code below walks the tree and creates the decision table. It is written for demonstration purposes and tested only on one sample tree.

tree2table <- function(party_tree) {

  df_list <- list()
  var_names <-  attr( party_tree$terms, "term.labels")
  var_levels <- lapply( party_tree$data, levels)

  walk_the_tree <- function(node, rule_branch = NULL) {
    # depth-first walk on partynode structure (recursive function)
    # decision rules are extracted for every branch
    if(missing(rule_branch)) {
      rule_branch <- setNames(data.frame(t(replicate(length(var_names), NA))), var_names)
      rule_branch <- cbind(rule_branch, nodeId = NA)
      rule_branch <- cbind(rule_branch, predict = NA)
    }
    if(is.terminal(node)) {
      rule_branch[["nodeId"]] <- node$id
      rule_branch[["predict"]] <- predict_party(party_tree, node$id) 
      df_list[[as.character(node$id)]] <<- rule_branch
    } else {
      for(i in 1:length(node)) {
        rule_branch1 <- rule_branch
        val1 <- decision_rule(node,i)
        rule_branch1[[names(val1)[1]]] <- val1
        walk_the_tree(node[i], rule_branch1)
      }
    }
  }

  decision_rule <- function(node, i) {
    # returns split decision rule in data.frame with variable name an values
    var_name <- var_names[node$split$varid[[1]]]
    values_vec <- var_levels[[var_name]][ node$split$index == i]
    values_txt <- paste(values_vec, collapse = ", ")
    return( setNames(values_txt, var_name))
  }
  # compile data frame list
  walk_the_tree(party_tree$node)
  # merge all dataframes
  res_table <- Reduce(rbind, df_list)
  return(res_table)
}

call function with the CHAID tree object:

table1 <- tree2table(chaidUS)

the result should be something like this:

gender   ager                       empstat   educr              marstat                          nodeId   predict  
-------- -------------------------- --------- ------------------ -------------------------------- -------- ---------
NA       NA                         NA        <HS, HS, >HS       married                          3        Gore     
NA       NA                         NA        College, Post Coll married                          4        Bush     
male     NA                         NA        NA                 widowed, divorced, never married 6        Gore     
female   18-24, 25-34, 35-44, 45-54 NA        NA                 widowed, divorced, never married 8        Gore     
female   55-64, 65+                 NA        NA                 widowed, divorced, never married 9        Gore


来源:https://stackoverflow.com/questions/28428508/chaid-regression-tree-to-table-conversion-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!