(R, dplyr) select multiple columns starts with same string and summarise mean (90% CI) by group

断了今生、忘了曾经 提交于 2019-12-11 01:40:20

问题


I am new to tidyverse, conceptually I would like to calculate mean and 90% CI of all columns starts with "ab", grouped by "case". Tried many ways but none seem to work, my actual data has many columns so explicitly list them out is not an option.

test data below

library(tidyverse)

dat <- tibble(case= c("case1", "case1", "case2", "case2", "case3"), 
              abc = c(1, 2, 3, 1, 2), 
              abe = c(1, 3, 2, 3, 4), 
              bca = c(1, 6, 3, 8, 9))

below code is what i would like to do conceptually, but doesn't work, obviously

dat %>% group_by(`case`) %>% 
  summarise(mean=mean(select(starts_with("ab"))), 
            qt=quantile(select(starts_with("ab"), prob=c(0.05, 0.95))))

what I would like to get is something like below

case abc_mean abe_mean abc_lb abc_ub abe_lb abe_ub

  <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 case1      1.5      2.0   1.05   1.95   1.10   2.90
2 case2      2.0      2.5   1.10   2.90   2.05   2.95
3 case3      2.0      4.0   2.00   2.00   4.00   4.00

回答1:


You were very close, just move that select before the summarise. We then use summarise_all, and specify the appropriate functions within funs.

dat %>%
    group_by(case) %>%
    select(starts_with('ab')) %>%
    summarise_all(funs('mean' = mean, 'ub' = quantile(., .95), 'lb' = quantile(., .05)))

# # A tibble: 3 x 7
#    case abc_mean abe_mean abc_ub abe_ub abc_lb abe_lb
#   <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 case1      1.5      2.0   1.95   2.90   1.05   1.10
# 2 case2      2.0      2.5   2.90   2.95   1.10   2.05
# 3 case3      2.0      4.0   2.00   4.00   2.00   4.00

We use summarise_all instead of summarise because we wish to perform the same operations on multiple columns. It requires far less typing to use summarise_all instead of a summarise call in which we specify each column and each operation separately.




回答2:


Another option is summarise_at. vars(starts_with("ab")) is for selecting columns, funs(...) is to apply summarzing functions.

library(tidyverse)

dat2 <- dat %>% 
  group_by(case) %>% 
  summarise_at(vars(starts_with("ab")), funs(mean = mean(.),
                                             lb = quantile(., prob = 0.05),
                                             ub = quantile(., prob = 0.95))) 
dat2
# # A tibble: 3 x 7
#    case abc_mean abe_mean abc_lb abe_lb abc_ub abe_ub
#   <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 case1      1.5      2.0   1.05   1.10   1.95   2.90
# 2 case2      2.0      2.5   1.10   2.05   2.90   2.95
# 3 case3      2.0      4.0   2.00   4.00   2.00   4.00


来源:https://stackoverflow.com/questions/47293652/r-dplyr-select-multiple-columns-starts-with-same-string-and-summarise-mean-9

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!