Group by using base R

前端 未结 2 1322
温柔的废话
温柔的废话 2020-11-30 14:30

Dataset

I have a datasets with column year,quarter,Channel,sales,units

df <- structure(list(year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
         


        
相关标签:
2条回答
  • 2020-11-30 15:19

    Here's another base R solution using by

    do.call(rbind, by(df, df[, 1:3], 
                      function(x) cbind(x[1, 1:3], sum(x$sales), mean(x$units))))
    

    Or using "split\apply\combine" theory

    t(sapply(split(df, df[, 1:3], drop = TRUE), 
                       function(x) c(sumSales = sum(x$sales), meanUnits = mean(x$units))))
    

    Or similarly

    do.call(rbind, lapply(split(df, df[, 1:3], drop = TRUE), 
                         function(x) c(sumSales = sum(x$sales), meanUnits = mean(x$units))))
    

    Edit: it seems like df is of class data.table (but you for some reason asked for base R solution only), here's how you would do it with your data.table object

    df[, .(sumSales = sum(sales), meanUnits = mean(units)), keyby = .(year, quarter, Channel)]
    #     year quarter Channel sumSales meanUnits
    #  1: 2013      Q1     AAA     4855      15.0
    #  2: 2013      Q1     BBB     2231      12.0
    #  3: 2013      Q2     AAA     4004      17.5
    #  4: 2013      Q2     BBB     2057      23.0
    #  5: 2013      Q3     AAA     2558      21.0
    #  6: 2013      Q3     BBB     4807      21.0
    #  7: 2013      Q4     AAA     4291      12.0
    #  8: 2013      Q4     BBB     1128      25.0
    #  9: 2014      Q1     AAA     2169      23.0
    # 10: 2014      Q1     CCC     3912      16.5
    # 11: 2014      Q2     AAA     2613      21.0
    # 12: 2014      Q2     BBB     1533      11.0
    # 13: 2014      Q2     CCC     2114      23.0
    # 14: 2014      Q3     BBB     5219      13.0
    # 15: 2014      Q3     CCC     1614      15.0
    # 16: 2014      Q4     AAA     2695      14.0
    # 17: 2014      Q4     BBB     4177      15.0
    
    0 讨论(0)
  • 2020-11-30 15:24

    you can try this

    aggregate(sales~year+quarter+Channel, data=df, FUN = sum) # sum of sale
    aggregate(units~year+quarter+Channel, data=df, FUN = mean) # mean of units
    
    0 讨论(0)
提交回复
热议问题