partykit: justify text in terminal node when unequal regressors' name lengths are included

对着背影说爱祢 提交于 2021-01-05 07:29:29

问题


I am trying to edit the esthetics of the terminal node to:

  1. Increase the size of the box such that the full names are listed inside of it.

  2. If possible, justify the text inside when in the presence of unequal regressors' name lengths to produce a table-like view of the terminal nodes.

Below I listed my attempts, using the gp option (fontsize = 10, boxwidth = 10), but I suspect that I am using the wrong esthetics options.

The mysummary function is highly inspired in this question.


library("partykit")

set.seed(1234L)
data("PimaIndiansDiabetes", package = "mlbench")
## a simple basic fitting function (of type 1) for a logistic regression
logit <- function(y, x, start = NULL, weights = NULL, offset = NULL, ...) {
                  glm(y ~ 0 + x, family = binomial, start = start, ...)}


## Long name regressors
PimaIndiansDiabetes$looooong_name_1 <- rnorm(nrow(PimaIndiansDiabetes))
PimaIndiansDiabetes$looooong_name_2 <- rnorm(nrow(PimaIndiansDiabetes))
## Short name regressor
PimaIndiansDiabetes$short_name <- rnorm(nrow(PimaIndiansDiabetes))


## set up a logistic regression tree
pid_tree <- mob(diabetes ~ glucose        + 
                          looooong_name_1 +
                          looooong_name_2 +
                          short_name      | 
                          pregnant + pressure + triceps + insulin +
                          mass + pedigree + age, data = PimaIndiansDiabetes, fit = logit)

## Summary function from: https://stackoverflow.com/questions/65495322/partykit-modify-terminal-node-to-include-standard-deviation-and-significance-of/65500344#65500344
mysummary <- function(info, digits = 2) {
  n <- info$nobs
  na <- format(names(coef(info$object)))
  cf <- format(coef(info$object), digits = digits)
  se <- format(sqrt(diag(vcov(info$object))), digits = digits)
  t <- format(coef(info$object)/sqrt(diag(vcov(info$object))) ,digits = digits)

  c(paste("n =", n),
    paste("Regressor","beta" ,"[", "t-ratio" ,"]"),
    paste(na, cf, "[",t,"]")
  )
}

#plot tree
plot(pid_tree,
     terminal_panel = node_terminal,
     tp_args = list(FUN = mysummary,fill = c("white")),
     gp = gpar(fontsize = 10,
               boxwidth = 10,           ## aparently this option doesn't belonw here,
               margins = rep(0.01, 4))) ## neither this does.


This is what I am getting:

but I would like to get something like the following:

Thanks a lot.


回答1:


A simple and basic solution is to use a proportional width font like Courier or Inconsolata:

plot(pid_tree, terminal_panel = node_terminal,
  tp_args = list(FUN = mysummary, fill = "white"),
  gp = gpar(fontfamily = "inconsolata"))

In addition to this simple text-based table, you can also produce more elaborate tables, e.g., via ggplot2 and gtable as in the following plot taken from: Seibold, Hothorn, Zeileis (2019). "Generalised Linear Model Trees with Global Additive Effects." Advances in Data Analysis and Classification, 13, 703-725. doi:10.1007/s11634-018-0342-1

The code is a little bit involved but available in the replication materials of the article. Specifically, you need these two files:

  • Supplementary material 1

  • Supplementary material 10



来源:https://stackoverflow.com/questions/65507425/partykit-justify-text-in-terminal-node-when-unequal-regressors-name-lengths-ar

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!