How can you automate the addition of overall percentages to the row_summary in the gt( ) package?

大兔子大兔子 提交于 2021-01-28 14:37:53

问题


In the gt( ) package, the row_summary( ) function readily supports the calculation of the mean percentage per observation, but this is not the same as the overall percentage distribution. I've come up with a solution (below) which works, but only by adding the overall row percentages one column at a time. Is there a way of 'automating' the addition of these overall percentages?

library(dplyr)
library(gt)

# Create test data
set.seed(1)
df <- tibble(some_letter = sample(letters, size = 10, replace = FALSE),
             num1 = sample(100:200, size = 10, replace = FALSE),
             num2 = sample(100:200, size = 10, replace = FALSE),
             n = num1 + num2) %>% 
      mutate(across(starts_with("num"), ~(.x)/(n), .names = "pct_{col}"))

# Use dplyr to calculate the correct overall totals and percentages [target]
df %>% 
  summarise_at(vars(num1, num2, n), funs(sum)) %>%
  mutate(across(starts_with("num"), ~(.x)/(n), .names = "pct_{col}"))

# Create table in gt( ), using a separate call to row_summary for each percentage
gt(df) %>% 
  summary_rows(fns = list(TOTAL = "sum"), columns = vars(num1, num2, n)) %>%
  summary_rows(fns = list(TOTAL = ~ sum(df$num1)/sum(df$n) ), columns = vars(pct_num1) ) %>%
  summary_rows(fns = list(TOTAL = ~ sum(df$num2)/sum(df$n) ), columns = vars(pct_num2) )

回答1:


I feel the solution you propose is the right one. As you are using rowwise functions, you need to compute the summary result for each column. As a consequence, you are forced to use summary_rows for each column (pct_num1, pct_num2). The great advantage of gt package is that you have a precise control on the values that appear in each cell of the summary rows. As a disadvantage, it looks pretty verbose.

In the code below, using a minimal example, I show the same problem. I do not define column n to show the use of rowwise function more clearly.

library(dplyr)
library(gt)

df_ex <- tribble(
  ~group, ~num1, ~num2,
     "A",     4,     1,
     "B",     5,     5
  ) %>% 
  rowwise() %>% 
  mutate(
    across(starts_with("num"),
      ~ .x / sum(c_across(starts_with("num"))),
     .names = "pct{col}")) %>%
  ungroup()

df_ex
#> # A tibble: 2 x 5
#>   group  num1  num2 pctnum1 pctnum2
#>   <chr> <dbl> <dbl>   <dbl>   <dbl>
#> 1 A         4     1     0.8     0.2
#> 2 B         5     5     0.5     0.5

These are the values that will appear in the summary row

df_ex %>% 
  summarise(num1 = sum(num1), num2 = sum(num2)) %>%
  rowwise() %>%
  mutate(pctnum1 = num1 / sum(c_across(starts_with("num"))), 
    pctnum2 = num2 / sum(c_across(starts_with("num"))))
#> # A tibble: 1 x 4
#> # Rowwise: 
#>    num1  num2 pctnum1 pctnum2
#>   <dbl> <dbl>   <dbl>   <dbl>
#> 1     9     6     0.6     0.4

As a solution to make the code more readable, in my opinion, you can define functions to compute the values that will appear in the summary rows. Nevertheless, this solution is the same as yours with a few cosmetics (rowwise use and external function definition of summary cells). Hope you find this useful.

compute_f1 <- function(x, df) {
  sum(df$num1) / sum(df$num1+df$num2)
}

compute_f2 <- function(x, df) {
  sum(df$num2) / sum(df$num1+df$num2)
}

df_ex %>% 
  gt %>% 
  summary_rows(fns = list(TOTAL = "sum"), columns = vars(num1, num2),
    formatter = fmt_number, decimals = 0) %>%
  summary_rows(fns = list(TOTAL = ~ compute_f1(.x, df_ex)), columns = vars(pctnum1),
    formatter = fmt_number, decimals = 1) %>%
  summary_rows(fns = list(TOTAL = ~ compute_f2(.x, df_ex)), columns = vars(pctnum2),
    formatter = fmt_number, decimals = 1) 

Created on 2020-11-14 by the reprex package (v0.3.0)



来源:https://stackoverflow.com/questions/63039692/how-can-you-automate-the-addition-of-overall-percentages-to-the-row-summary-in-t

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!