in R, how to calculate mean of all column, by group?

后端 未结 5 1703
挽巷
挽巷 2020-12-18 06:25

I need to get the mean of all columns of a large data set using R, grouped by 2 variables.

Lets try it with mtcars:

library(dplyr)
g_mtcars <- gro         


        
相关标签:
5条回答
  • 2020-12-18 06:42

    For the sake of completeness you could use package plyr and do this:

    library(plyr)
    ddply(mtcars,c('cyl','gear'), summarize,mean_hp=mean(hp))
    
    0 讨论(0)
  • 2020-12-18 06:45

    You can use multiple mean statements in dplyr::summarize like this:

    library(dplyr)
    
    mtcars %>% 
      group_by(cyl, gear) %>% 
      summarize(mean_hp = mean(hp), mean_wt = mean(wt))
    
    # Source: local data frame [8 x 4]
    # Groups: cyl [?]
    
    #     cyl  gear  mean_hp  mean_wt
    #   <dbl> <dbl>    <dbl>    <dbl>
    # 1     4     3  97.0000 2.465000
    # 2     4     4  76.0000 2.378125
    # 3     4     5 102.0000 1.826500
    # 4     6     3 107.5000 3.337500
    # 5     6     4 116.5000 3.093750
    # 6     6     5 175.0000 2.770000
    # 7     8     3 194.1667 4.104083
    # 8     8     5 299.5000 3.370000
    
    0 讨论(0)
  • 2020-12-18 06:50

    Edit2: Recent version of dplyr suggests using regular summarise with across function, as in:

    library(dplyr)
    mtcars %>% 
    group_by(cyl, gear) %>%
    summarise(across(everything(), mean))
    

    What you're looking for is either ?summarise_all or ?summarise_each from dplyr

    Edit: full code:

    library(dplyr)
    mtcars %>% 
        group_by(cyl, gear) %>%
        summarise_all("mean")
    
    # Source: local data frame [8 x 11]
    # Groups: cyl [?]
    # 
    #     cyl  gear    mpg     disp       hp     drat       wt    qsec    vs    am     carb
    #   <dbl> <dbl>  <dbl>    <dbl>    <dbl>    <dbl>    <dbl>   <dbl> <dbl> <dbl>    <dbl>
    # 1     4     3 21.500 120.1000  97.0000 3.700000 2.465000 20.0100   1.0  0.00 1.000000
    # 2     4     4 26.925 102.6250  76.0000 4.110000 2.378125 19.6125   1.0  0.75 1.500000
    # 3     4     5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000   0.5  1.00 2.000000
    # 4     6     3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300   1.0  0.00 1.000000
    # 5     6     4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700   0.5  0.50 4.000000
    # 6     6     5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000   0.0  1.00 6.000000
    # 7     8     3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425   0.0  0.00 3.083333
    # 8     8     5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500   0.0  1.00 6.000000
    
    0 讨论(0)
  • 2020-12-18 06:56

    aggregate is the easiest way to do this in base:

    aggregate(. ~ cyl + gear, data = mtcars, FUN = mean)
    #   cyl gear    mpg     disp       hp     drat       wt    qsec  vs   am     carb
    # 1   4    3 21.500 120.1000  97.0000 3.700000 2.465000 20.0100 1.0 0.00 1.000000
    # 2   6    3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300 1.0 0.00 1.000000
    # 3   8    3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425 0.0 0.00 3.083333
    # 4   4    4 26.925 102.6250  76.0000 4.110000 2.378125 19.6125 1.0 0.75 1.500000
    # 5   6    4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700 0.5 0.50 4.000000
    # 6   4    5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000 0.5 1.00 2.000000
    # 7   6    5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000 0.0 1.00 6.000000
    # 8   8    5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500 0.0 1.00 6.000000
    
    0 讨论(0)
  • 2020-12-18 07:00

    using data.table.(however you can't setDT(mtcars) because binding is locked. copy it to a different name like dt and try

     library(data.table)
     mt_dt = as.data.table(mtcars)
     mt_dt[ , lapply(.SD, mean) , by=c("cyl", "gear")]
    
    0 讨论(0)
提交回复
热议问题