aggregating multiple columns in data.table

问题

I have the following sample data.table:

dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))

I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums, for example. What is the correct way to do this? The following does not work:

 dtb[,colSums, by="id"]

This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name

回答1:

this is actually what i was looking for and is mentioned in the FAQ:

dtb[,lapply(.SD,mean),by="id"]

回答2:

I guess in this case is it fastest to bring your data first into the long format and do your aggregation next (see Matthew's comments in this SO post):

library(data.table)
dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
library(reshape2)
dt_long <- as.data.table(melt(dtb, id.var="id"))
dt_long[, sum(value), by=c("id","variable")]
    id variable  V1
 1:  1        a 601
 2:  2        a 440
 3:  3        a 496
 4:  4        a 553
 5:  5        a 444
 6:  6        a 466
 7:  7        a 525
 8:  8        a 553
 9:  9        a 541
...

来源：https://stackoverflow.com/questions/11695490/aggregating-multiple-columns-in-data-table

标签

dataframe

aggregate

data.table

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!