Group by multiple columns and sum other multiple columns

时间秒杀一切 提交于 2019-11-26 08:51:37

The data.table way is :

DT[, lapply(.SD,sum), by=list(col1,col2,col3,...)]

or

DT[, lapply(.SD,sum), by=colnames(DT)[1:10]]

where .SD is the (S)ubset of (D)ata excluding group columns. (Aside: If you need to refer to group columns generically, they are in .BY.)

This seems like a task for ddply (I use the 'baseball' dataset which is included with plyr):

library(plyr)
groupColumns = c("year","team")
dataColumns = c("hr", "rbi","sb")
res = ddply(baseball, groupColumns, function(x) colSums(x[dataColumns]))
head(res)

This gives per groupColumns the sum of the columns specified in dataColumns.

In base R this would be...

aggregate( as.matrix(df[,11:200]), as.list(df[,1:10]), FUN = sum)

EDIT: The aggregate function has come a long way since I wrote this. None of the casting above is necessary.

aggregate( df[,11:200], df[,1:10], FUN = sum )

And there are a variety of ways to write this. Assuming the first 10 columns are named a1 through a10 I like the following, even though it is verbose.

aggregate(. ~ a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, data = dat, FUN = sum)

(You could use paste to construct the formula and use formula)

The dplyr way would be:

library(dplyr)
df %>%
  group_by(col1, col2, col3) %>%
  summarise_each(funs(sum))

You can further specify the columns to be summarised or excluded from the summarise_each by using the special functions mentioned in the help file of ?dplyr::select.

Luciano Selzer

Using plyr::ddply:

library(plyr)
ddply(dtfr, .(name1, name2, namex), numcolwise(sum))

Another way to do this with dplyr that would be generic (don't need list of columns) would be:

df %>% group_by_if(is.factor) %>% summarize_if(is.numeric,sum,na.rm = TRUE)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!