问题
I have the following sample data.table:
dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums, for example. What is the correct way to do this? The following does not work:
dtb[,colSums, by="id"]
This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name
回答1:
this is actually what i was looking for and is mentioned in the FAQ:
dtb[,lapply(.SD,mean),by="id"]
回答2:
I guess in this case is it fastest to bring your data first into the long format and do your aggregation next (see Matthew's comments in this SO post):
library(data.table)
dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
library(reshape2)
dt_long <- as.data.table(melt(dtb, id.var="id"))
dt_long[, sum(value), by=c("id","variable")]
id variable V1
1: 1 a 601
2: 2 a 440
3: 3 a 496
4: 4 a 553
5: 5 a 444
6: 6 a 466
7: 7 a 525
8: 8 a 553
9: 9 a 541
...
来源:https://stackoverflow.com/questions/11695490/aggregating-multiple-columns-in-data-table