How to use R for same field aggregation by multiple separate group

前端 未结 2 334
梦毁少年i
梦毁少年i 2020-12-21 14:01

I\'m trying to perform count of an indicator on several (actually hundreds) groups separately (NOT on all combinations of all groups). I\'ll demonstrate it by simplified exa

相关标签:
2条回答
  • 2020-12-21 14:39

    1) tapply The first argument of tapply is data with each column replaced by some_indicator. The second argument indicates that we wish to group by the groups in data and by the column number.

    result <- tapply(replace(data, TRUE, some_indicator), list(data, col(data)), sum)
    replace(unname(result), is.na(result), 0)
    

    For the input shown in the question, the last line gives:

         [,1] [,2] [,3]
    [1,]    1    1    0
    [2,]    2    1    1
    [3,]    0    1    2
    

    1a) tapply A somewhat longer tapply solution would be the following. fun takes a column as its argument and uses tapply to sum the groups in some_indicator using that column as the group; however, different columns could have different sets of groups in them so to ensure that they all have the same set of groups (for later alignment) we actually groups by factor(x, levs). The sapply applies fun to each column of data. The as.data.frame is needed since data is a matrix so sapply would apply across each element rather than each column if we were to apply it to that.

     levs <- levels(factor(data))
     fun <- function(x) tapply(some_indicator, factor(x, levs), sum)
     result <- sapply(as.data.frame(data), fun)
     replace(unname(result), is.na(result), 0)
    

    2) xtabs This is quite similar to the tapply solution. It does have the advantages that: (1) sum is implied by xtabs and so need not be specified and also (2) unfilled cells are filled with 0 rather than NA eliminating the extra step of replacing of NAs with 0. On the other hand we must unravel each component of the formula into a vector using c since unlike tapply the xtabs formula will not accept matrices:

    result <- xtabs(c(replace(data, TRUE, some_indicator)) ~ c(data) + c(col(data)))
    dimnames(result) <- NULL
    

    For the data in the question this gives:

    > result
         [,1] [,2] [,3]
    [1,]    1    1    0
    [2,]    2    1    1
    [3,]    0    1    2
    

    REVISED Revised tapply solution and added xtabs solution.

    0 讨论(0)
  • 2020-12-21 14:41

    melt from "reshape2" has a method for matrices which could be useful here. Using "reshape2", the solution could be as straightforward as:

    library(reshape2)
    dcast(cbind(some_indicator, melt(data)), 
          value ~ Var2, value.var= "some_indicator", 
          fun.aggregate=sum)
    #   value 1 2 3
    # 1     1 1 1 0
    # 2     2 2 1 1
    # 3     3 0 1 2
    

    This answer assumes some prior knowledge of how melt works on a matrix, in particular that it will create a three-column data.frame with "Var1" representing the rownames (or numbers), "Var2" representing the colnames (or numbers), and "value" representing the values from the matrix.

    0 讨论(0)
提交回复
热议问题