R Dynamically build “list” in data.table (or ddply)

后端 未结 4 1742
野的像风
野的像风 2020-12-10 04:29

My aggregation needs vary among columns / data.frames. I would like to pass the \"list\" argument to the data.table dynamically.

As a minimal example:



        
4条回答
  •  伪装坚强ぢ
    2020-12-10 04:40

    Another way is to use .SDcols to group the columns for which you'd like to perform the same operations together. Let's say that you require columns a,d,e to be summed by type where as, b,g should have mean taken and c,f its median, then,

    # constructing an example data.table:
    set.seed(45)
    dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9), 
                     b = rnorm(9), c=runif(9), d=sample(9), e=sample(9), 
                     f = runif(9), g=rnorm(9))
    
    #     type a          b         c d e         f          g
    # 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
    # 2: hello 3  1.1773119 0.6559926 3 3 0.4586280 -0.8376586
    # 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597  1.7216593
    # 4:   bye 8 -0.2260640 0.3924327 8 2 0.1271187  0.4360063
    # 5:   bye 7 -1.0720503 0.3256450 7 8 0.5774691  0.7571990
    # 6:   bye 5 -0.7131021 0.4855804 6 9 0.2687791  1.5398858
    # 7:    ok 1 -0.4680549 0.8476840 2 4 0.5633317  1.5393945
    # 8:    ok 4  0.4183264 0.4402595 4 1 0.7592801  2.1829996
    # 9:    ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758
    
    # 1) set key
    setkey(dt, "type")
    
    # 2) group col-ids by similar operations
    id1 <- which(names(dt) %in% c("a", "d", "e"))
    id2 <- which(names(dt) %in% c("b","g"))
    id3 <- which(names(dt) %in% c("c","f"))
    
    # 3) now use these ids in with .SDcols parameter
    dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
    dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
    dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]
    
    # 4) merge them.
    dt1[dt2[dt3]]
    
    #     type  a  d  e          b          g         c         f
    # 1:   bye 20 21 19 -0.6704055  0.9110304 0.3924327 0.2687791
    # 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
    # 3:    ok 14 11 10 -0.5104907  0.9090061 0.5080116 0.5633317
    

    If/when you have many many column, making a list like the one you've might be cumbersome.

提交回复
热议问题