for each group summarise means for all variables in dataframe (ddply? split?)

前端 未结 6 1075
难免孤独
难免孤独 2020-12-13 16:00

A week ago I would have done this manually: subset dataframe by group to new dataframes. For each dataframe compute means for each variables, then rbind. very clunky ...

6条回答
  •  天命终不由人
    2020-12-13 16:45

    First do a simple aggregate to get it summarized.

    df <- aggregate(cbind(var0, var1, var2, var3, var4) ~ year + group, test_data, mean)
    

    That makes a data.frame like this...

       year group     var0      var1     var2     var3     var4
    1  2007     a 42.25000 0.2031277 2.145394 2.801812 3.571999
    2  2009     a 30.50000 1.2033653 1.475158 3.618023 4.127601
    3  2007     b 52.60000 1.4564604 2.224850 3.053322 4.339109
    ...
    

    That, by itself, is pretty close to what you wanted. You could just break it up by group now.

    l <- split(df, df$group)
    

    OK, so that's not quite it but we can refine the output if you really want to.

    lapply(l, function(x) {d <- t(x[,3:7]); colnames(d) <- x[,2]; d})
    
    $a
               2007      2009
    var0 42.2500000 30.500000
    var1  0.2031277  1.203365
    var2  2.1453939  1.475158
    ...
    

    That doesn't have all your table formatting but it's organized exactly as you describe and is darn close. This last step you could pretty up how you like.

    This is the only answer here that matches the requested organization, and it's the fastest way to do it in R. BTW, I wouldn't bother doing that last step and just stick with the very first output from the aggregate... or maybe the split.

提交回复
热议问题