R Dynamically build “list” in data.table (or ddply)

后端 未结 4 1739
野的像风
野的像风 2020-12-10 04:29

My aggregation needs vary among columns / data.frames. I would like to pass the \"list\" argument to the data.table dynamically.

As a minimal example:



        
相关标签:
4条回答
  • 2020-12-10 04:37

    This is explained FAQ 1.6 what you are looking for is quote and eval

    something like

     mycall <- quote(list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c))))
    
     DT[, eval(mycall)]
    

    After a bit of head-banging, here is a very ugly way of constructing the call for ddply using .()

    myplyrcall <- .(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
    
    do.call(ddply,c(.data = quote(DF), .variables = 'type',.fun = quote(summarise),myplyrcall))
    

    You could also use as.quoted which has an as.quoted.character method to construct using paste0

    myplc <-as.quoted(c("lengtha" = "length(as.numeric(a))", "maxb" = "max(as.numeric(b))", "meanc" = "mean(as.numeric(c))"))
    

    This can be used with data.table as well!

    dtcall <- as.quoted(mylist)[[1]]
    
    
    DT[,eval(dtcall), by = type]
    

    data.table all the way.

    0 讨论(0)
  • 2020-12-10 04:40

    Another way is to use .SDcols to group the columns for which you'd like to perform the same operations together. Let's say that you require columns a,d,e to be summed by type where as, b,g should have mean taken and c,f its median, then,

    # constructing an example data.table:
    set.seed(45)
    dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9), 
                     b = rnorm(9), c=runif(9), d=sample(9), e=sample(9), 
                     f = runif(9), g=rnorm(9))
    
    #     type a          b         c d e         f          g
    # 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
    # 2: hello 3  1.1773119 0.6559926 3 3 0.4586280 -0.8376586
    # 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597  1.7216593
    # 4:   bye 8 -0.2260640 0.3924327 8 2 0.1271187  0.4360063
    # 5:   bye 7 -1.0720503 0.3256450 7 8 0.5774691  0.7571990
    # 6:   bye 5 -0.7131021 0.4855804 6 9 0.2687791  1.5398858
    # 7:    ok 1 -0.4680549 0.8476840 2 4 0.5633317  1.5393945
    # 8:    ok 4  0.4183264 0.4402595 4 1 0.7592801  2.1829996
    # 9:    ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758
    
    # 1) set key
    setkey(dt, "type")
    
    # 2) group col-ids by similar operations
    id1 <- which(names(dt) %in% c("a", "d", "e"))
    id2 <- which(names(dt) %in% c("b","g"))
    id3 <- which(names(dt) %in% c("c","f"))
    
    # 3) now use these ids in with .SDcols parameter
    dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
    dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
    dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]
    
    # 4) merge them.
    dt1[dt2[dt3]]
    
    #     type  a  d  e          b          g         c         f
    # 1:   bye 20 21 19 -0.6704055  0.9110304 0.3924327 0.2687791
    # 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
    # 3:    ok 14 11 10 -0.5104907  0.9090061 0.5080116 0.5633317
    

    If/when you have many many column, making a list like the one you've might be cumbersome.

    0 讨论(0)
  • 2020-12-10 04:40

    Another method (supporting the use of paste or paste0 to build the expression):

    expr <- parse(text=mylist)
    DT[, eval( expr ), by=type]
    #-------
        type lengtha      maxb     meanc
    1: hello       3 0.8265407 0.5244094
    2:   bye       3 0.4955301 0.6289475
    3:    ok       3 0.9527455 0.5600915
    
    0 讨论(0)
  • 2020-12-10 04:41

    I find it worrysome that apparently eval is part of the answer. From your question it is not clear to me, if and why you really want to do what you claim to want. Thus I demonstrate here that you can also use a function:

    fun <- function(a,b,c) {
      list(lengtha = length(as.numeric(a)), 
              maxb = max(as.numeric(b)), 
             meanc = mean(as.numeric(c)))  
    }
    
    DT[, fun(a,b,c), by=type]
    
        type lengtha      maxb     meanc
    1: hello       3 0.8792184 0.3745643
    2:   bye       3 0.8718397 0.4519999
    3:    ok       3 0.8900764 0.4511536
    
    0 讨论(0)
提交回复
热议问题