My aggregation needs vary among columns / data.frames. I would like to pass the \"list\" argument to the data.table dynamically.
As a minimal example:
Another way is to use .SDcols
to group the columns for which you'd like to perform the same operations together. Let's say that you require columns a,d,e
to be summed by type
where as, b,g
should have mean
taken and c,f
its median, then,
# constructing an example data.table:
set.seed(45)
dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9),
b = rnorm(9), c=runif(9), d=sample(9), e=sample(9),
f = runif(9), g=rnorm(9))
# type a b c d e f g
# 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
# 2: hello 3 1.1773119 0.6559926 3 3 0.4586280 -0.8376586
# 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597 1.7216593
# 4: bye 8 -0.2260640 0.3924327 8 2 0.1271187 0.4360063
# 5: bye 7 -1.0720503 0.3256450 7 8 0.5774691 0.7571990
# 6: bye 5 -0.7131021 0.4855804 6 9 0.2687791 1.5398858
# 7: ok 1 -0.4680549 0.8476840 2 4 0.5633317 1.5393945
# 8: ok 4 0.4183264 0.4402595 4 1 0.7592801 2.1829996
# 9: ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758
# 1) set key
setkey(dt, "type")
# 2) group col-ids by similar operations
id1 <- which(names(dt) %in% c("a", "d", "e"))
id2 <- which(names(dt) %in% c("b","g"))
id3 <- which(names(dt) %in% c("c","f"))
# 3) now use these ids in with .SDcols parameter
dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]
# 4) merge them.
dt1[dt2[dt3]]
# type a d e b g c f
# 1: bye 20 21 19 -0.6704055 0.9110304 0.3924327 0.2687791
# 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
# 3: ok 14 11 10 -0.5104907 0.9090061 0.5080116 0.5633317
If/when you have many many column, making a list like the one you've might be cumbersome.