I\'m trying to perform count of an indicator on several (actually hundreds) groups separately (NOT on all combinations of all groups). I\'ll demonstrate it by simplified exa
1) tapply The first argument of tapply
is data
with each column replaced by some_indicator
. The second argument indicates that we wish to group by the groups in data and by the column number.
result <- tapply(replace(data, TRUE, some_indicator), list(data, col(data)), sum)
replace(unname(result), is.na(result), 0)
For the input shown in the question, the last line gives:
[,1] [,2] [,3]
[1,] 1 1 0
[2,] 2 1 1
[3,] 0 1 2
1a) tapply A somewhat longer tapply
solution would be the following. fun
takes a column as its argument and uses tapply
to sum the groups in some_indicator
using that column as the group; however, different columns could have different sets of groups in them so to ensure that they all have the same set of groups (for later alignment) we actually groups by factor(x, levs)
. The sapply
applies fun
to each column of data
. The as.data.frame
is needed since data
is a matrix so sapply
would apply across each element rather than each column if we were to apply it to that.
levs <- levels(factor(data))
fun <- function(x) tapply(some_indicator, factor(x, levs), sum)
result <- sapply(as.data.frame(data), fun)
replace(unname(result), is.na(result), 0)
2) xtabs This is quite similar to the tapply
solution. It does have the advantages that: (1) sum
is implied by xtabs
and so need not be specified and also (2) unfilled cells are filled with 0 rather than NA eliminating the extra step of replacing of NAs with 0. On the other hand we must unravel each component of the formula into a vector using c
since unlike tapply
the xtabs
formula will not accept matrices:
result <- xtabs(c(replace(data, TRUE, some_indicator)) ~ c(data) + c(col(data)))
dimnames(result) <- NULL
For the data in the question this gives:
> result
[,1] [,2] [,3]
[1,] 1 1 0
[2,] 2 1 1
[3,] 0 1 2
REVISED Revised tapply
solution and added xtabs
solution.
melt
from "reshape2" has a method for matrices which could be useful here. Using "reshape2", the solution could be as straightforward as:
library(reshape2)
dcast(cbind(some_indicator, melt(data)),
value ~ Var2, value.var= "some_indicator",
fun.aggregate=sum)
# value 1 2 3
# 1 1 1 1 0
# 2 2 2 1 1
# 3 3 0 1 2
This answer assumes some prior knowledge of how melt
works on a matrix
, in particular that it will create a three-column data.frame
with "Var1" representing the rownames
(or numbers), "Var2" representing the colnames
(or numbers), and "value" representing the values from the matrix
.