R data.table - Apply function A to some columns and function B to some others

前端 未结 1 825
闹比i
闹比i 2021-01-03 06:17

I want to aggregate datatable\'s row, but the aggragation function depends on the name of the column.

For example, if column name is:

  • variable1
相关标签:
1条回答
  • 2021-01-03 06:45

    Here is one way to do it with Map or mapply:

    Let's make some toy data first:

    dt <- data.table(
        variable1 = rnorm(100),
        variable2 = rnorm(100),
        variable3 = rnorm(100),
        variable4 = rnorm(100),
        grp = sample(letters[1:5], 100, replace = T)
    )
    
    colsToMean <- c("variable1", "variable2") 
    colsToMax <- c("variable3")   
    colsToSd <- c("variable4")
    

    Then,

    scols <- list(colsToMean, colsToMax, colsToSd)
    funs <- rep(c(mean, max, sd), lengths(scols))
    
    # summary
    dt[, Map(function(f, x) f(x), funs, .SD), by = grp, .SDcols = unlist(scols)]
    
    # or replace the original values with summary statistics as in OP
    dt[, unlist(scols) := Map(function(f, x) f(x), funs, .SD), by = grp, .SDcols = unlist(scols)]
    

    Another option with GForce on:

    scols <- list(colsToMean, colsToMax, colsToSd)
    funs <- rep(c('mean', 'max', 'sd'), lengths(scols))
    
    jexp <- paste0('list(', paste0(funs, '(', unlist(scols), ')', collapse = ', '), ')')
    dt[, eval(parse(text = jexp)), by = grp, verbose = TRUE]
    
    # Detected that j uses these columns: variable1,variable2,variable3,variable4 
    # Finding groups using forderv ... 0.000sec 
    # Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec 
    # Getting back original order ... 0.000sec 
    # lapply optimization is on, j unchanged as 'list(mean(variable1), mean(variable2), max(variable3), sd(variable4))'
    # GForce optimized j to 'list(gmean(variable1), gmean(variable2), gmax(variable3), gsd(variable4))'
    # Making each group and running j (GForce TRUE) ... 0.000sec 
    
    0 讨论(0)
提交回复
热议问题