How to use R for same field aggregation by multiple separate group

前端未结

关注

 2  334

I\'m trying to perform count of an indicator on several (actually hundreds) groups separately (NOT on all combinations of all groups). I\'ll demonstrate it by simplified exa

相关标签:

2条回答

别那么骄傲

2020-12-21 14:39
1) tapply The first argument of tapply is data with each column replaced by some_indicator. The second argument indicates that we wish to group by the groups in data and by the column number.
```
result <- tapply(replace(data, TRUE, some_indicator), list(data, col(data)), sum)
replace(unname(result), is.na(result), 0)
```
For the input shown in the question, the last line gives:
```
     [,1] [,2] [,3]
[1,]    1    1    0
[2,]    2    1    1
[3,]    0    1    2
```
1a) tapply A somewhat longer tapply solution would be the following. fun takes a column as its argument and uses tapply to sum the groups in some_indicator using that column as the group; however, different columns could have different sets of groups in them so to ensure that they all have the same set of groups (for later alignment) we actually groups by factor(x, levs). The sapply applies fun to each column of data. The as.data.frame is needed since data is a matrix so sapply would apply across each element rather than each column if we were to apply it to that.
```
 levs <- levels(factor(data))
 fun <- function(x) tapply(some_indicator, factor(x, levs), sum)
 result <- sapply(as.data.frame(data), fun)
 replace(unname(result), is.na(result), 0)
```
2) xtabs This is quite similar to the tapply solution. It does have the advantages that: (1) sum is implied by xtabs and so need not be specified and also (2) unfilled cells are filled with 0 rather than NA eliminating the extra step of replacing of NAs with 0. On the other hand we must unravel each component of the formula into a vector using c since unlike tapply the xtabs formula will not accept matrices:
```
result <- xtabs(c(replace(data, TRUE, some_indicator)) ~ c(data) + c(col(data)))
dimnames(result) <- NULL
```
For the data in the question this gives:
```
> result
     [,1] [,2] [,3]
[1,]    1    1    0
[2,]    2    1    1
[3,]    0    1    2
```
REVISED Revised tapply solution and added xtabs solution.
0 讨论(0)
发布评论:

提交评论
- 加载中...
盖世英雄少女心

2020-12-21 14:41
melt from "reshape2" has a method for matrices which could be useful here. Using "reshape2", the solution could be as straightforward as:
```
library(reshape2)
dcast(cbind(some_indicator, melt(data)), 
      value ~ Var2, value.var= "some_indicator", 
      fun.aggregate=sum)
#   value 1 2 3
# 1     1 1 1 0
# 2     2 2 1 1
# 3     3 0 1 2
```
This answer assumes some prior knowledge of how melt works on a matrix, in particular that it will create a three-column data.frame with "Var1" representing the rownames (or numbers), "Var2" representing the colnames (or numbers), and "value" representing the values from the matrix.
0 讨论(0)
发布评论:

提交评论
- 加载中...