apply a function over groups of columns

后端 未结 6 837
梦谈多话
梦谈多话 2020-11-28 12:18

How can I use apply or a related function to create a new data frame that contains the results of the row averages of each pair of columns in a very large data

6条回答
  •  离开以前
    2020-11-28 12:43

    This may be more generalizable to your situation in that you pass a list of indices. If speed is an issue (large data frame) I'd opt for lapply with do.call rather than sapply:

    x <- list(1:3, 4:6)
    do.call(cbind, lapply(x, function(i) rowMeans(dat[, i])))
    

    Works if you just have col names too:

    x <- list(c('a','b','c'), c('d', 'e', 'f'))
    do.call(cbind, lapply(x, function(i) rowMeans(dat[, i])))
    

    EDIT

    Just happened to think maybe you want to automate this to do every three columns. I know there's a better way but here it is on a 100 column data set:

    dat <- data.frame(matrix(rnorm(16*100), ncol=100))
    
    n <- 1:ncol(dat)
    ind <- matrix(c(n, rep(NA, 3 - ncol(dat)%%3)), byrow=TRUE, ncol=3)
    ind <- data.frame(t(na.omit(ind)))
    do.call(cbind, lapply(ind, function(i) rowMeans(dat[, i])))
    

    EDIT 2 Still not happy with the indexing. I think there's a better/faster way to pass the indexes. here's a second though not satisfying method:

    n <- 1:ncol(dat)
    ind <- data.frame(matrix(c(n, rep(NA, 3 - ncol(dat)%%3)), byrow=F, nrow=3))
    nonna <- sapply(ind, function(x) all(!is.na(x)))
    ind <- ind[, nonna]
    
    do.call(cbind, lapply(ind, function(i)rowMeans(dat[, i])))
    

提交回复
热议问题