How to use R for same field aggregation by multiple separate group

淺唱寂寞╮ 提交于 2019-11-29 16:00:01

1) tapply The first argument of tapply is data with each column replaced by some_indicator. The second argument indicates that we wish to group by the groups in data and by the column number.

result <- tapply(replace(data, TRUE, some_indicator), list(data, col(data)), sum)
replace(unname(result), is.na(result), 0)

For the input shown in the question, the last line gives:

     [,1] [,2] [,3]
[1,]    1    1    0
[2,]    2    1    1
[3,]    0    1    2

1a) tapply A somewhat longer tapply solution would be the following. fun takes a column as its argument and uses tapply to sum the groups in some_indicator using that column as the group; however, different columns could have different sets of groups in them so to ensure that they all have the same set of groups (for later alignment) we actually groups by factor(x, levs). The sapply applies fun to each column of data. The as.data.frame is needed since data is a matrix so sapply would apply across each element rather than each column if we were to apply it to that.

 levs <- levels(factor(data))
 fun <- function(x) tapply(some_indicator, factor(x, levs), sum)
 result <- sapply(as.data.frame(data), fun)
 replace(unname(result), is.na(result), 0)

2) xtabs This is quite similar to the tapply solution. It does have the advantages that: (1) sum is implied by xtabs and so need not be specified and also (2) unfilled cells are filled with 0 rather than NA eliminating the extra step of replacing of NAs with 0. On the other hand we must unravel each component of the formula into a vector using c since unlike tapply the xtabs formula will not accept matrices:

result <- xtabs(c(replace(data, TRUE, some_indicator)) ~ c(data) + c(col(data)))
dimnames(result) <- NULL

For the data in the question this gives:

> result
     [,1] [,2] [,3]
[1,]    1    1    0
[2,]    2    1    1
[3,]    0    1    2

REVISED Revised tapply solution and added xtabs solution.

melt from "reshape2" has a method for matrices which could be useful here. Using "reshape2", the solution could be as straightforward as:

library(reshape2)
dcast(cbind(some_indicator, melt(data)), 
      value ~ Var2, value.var= "some_indicator", 
      fun.aggregate=sum)
#   value 1 2 3
# 1     1 1 1 0
# 2     2 2 1 1
# 3     3 0 1 2

This answer assumes some prior knowledge of how melt works on a matrix, in particular that it will create a three-column data.frame with "Var1" representing the rownames (or numbers), "Var2" representing the colnames (or numbers), and "value" representing the values from the matrix.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!