How to add totals as well as group_by statistics in R

懵懂的女人 提交于 2019-12-11 15:16:56

问题


When computing any statistic using summarise and group_by we only get the summary statistic per-category, and not the value for all the population (Total). How to get both?

I am looking for something clean and short. Until now I can only think of:

bind_rows( 
  iris %>% group_by(Species) %>% summarise(
    "Mean" = mean(Sepal.Width), 
    "Median" = median(Sepal.Width), 
    "sd" = sd(Sepal.Width), 
    "p10" = quantile(Sepal.Width, probs = 0.1))
  , 
  iris %>% summarise(
    "Mean" = mean(Sepal.Width), 
    "Median" = median(Sepal.Width), 
    "sd" = sd(Sepal.Width), 
    "p10" = quantile(Sepal.Width, probs = 0.1)) %>% 
  mutate(Species = "Total")
  )

But I would like something more compact. In particular, I don't want to type the code (for summarize) twice, once for each group and once for the total.


回答1:


You can simplify it if you untangle what you're trying to do: you have iris data that has several species, and you want that summarized along with data for all species. You don't need to calculate those summary stats before you can bind. Instead, bind iris with a version of iris that's been set to Species = "Total", then group and summarize.

library(tidyverse)

bind_rows(
  iris,
  iris %>% mutate(Species = "Total")
) %>%
  group_by(Species) %>%
  summarise(Mean = mean(Sepal.Width),
            Median = median(Sepal.Width),
            sd = sd(Sepal.Width),
            p10 = quantile(Sepal.Width, probs = 0.1))
#> # A tibble: 4 x 5
#>   Species     Mean Median    sd   p10
#>   <chr>      <dbl>  <dbl> <dbl> <dbl>
#> 1 setosa      3.43    3.4 0.379  3   
#> 2 Total       3.06    3   0.436  2.5 
#> 3 versicolor  2.77    2.8 0.314  2.3 
#> 4 virginica   2.97    3   0.322  2.59

I like the caution in the comments above, though I have to do this sort of calculation for work enough that I have a similar shorthand function in a personal package. It perhaps makes less sense for things like standard deviations, but it's something I need to do a lot for adding up totals of demographic groups, etc. (If it's useful, that function is here).




回答2:


bit shorter, though quite similar to bind_rows

    q10 <- function(x){quantile(x , probs=0.1)}

    iris %>% 
      select(Species,Sepal.Width)%>%
      group_by(Species) %>% 
      summarise_all(c("mean", "sd", "q10")) %>% 
      t() %>% 

      cbind(c("total", iris %>% select(Sepal.Width) %>% summarise_all(c("mean", "sd", "q10")))) %>% 
      t()

more clean probably:

  bind_rows( 
    iris %>% 
      group_by(Species) %>%  
      select(Sepal.Width)%>%
      summarise_all(c("mean", "sd", "q10"))
    , 
    iris %>% 
      select(Sepal.Width)%>%
      summarise_all(c("mean", "sd", "q10")) %>% 
      mutate(Species = "Total")
  )


来源:https://stackoverflow.com/questions/51793505/how-to-add-totals-as-well-as-group-by-statistics-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!