I wanted to sum individual columns by group and my first thought was to use tapply
.
However, I cannot get tapply
to work. Can tapply
I looked at the source code for by
, as EDi suggested. That code was substantially more complex than my change to the one line in tapply
. I have now found that my.tapply
does not work with the more complex scenario below where apples
and cherries
are summed by state
and county
. If I get my.tapply
to work with this case I can post the code here later:
df.2 <- read.table(text = '
state county apples cherries plums
AA 1 1 2 3
AA 1 1 2 3
AA 2 10 20 30
AA 2 10 20 30
AA 3 100 200 300
AA 3 100 200 300
BB 7 -1 -2 -3
BB 7 -1 -2 -3
BB 8 -10 -20 -30
BB 8 -10 -20 -30
BB 9 -100 -200 -300
BB 9 -100 -200 -300
', header = TRUE, stringsAsFactors = FALSE)
# my function works
tapply(df.2$apples , list(df.2$state, df.2$county), function(x) {sum(x)})
my.tapply(df.2$apples , list(df.2$state, df.2$county), function(x) {sum(x)})
# my function works
tapply(df.2$cherries, list(df.2$state, df.2$county), function(x) {sum(x)})
my.tapply(df.2$cherries, list(df.2$state, df.2$county), function(x) {sum(x)})
# my function does not work
my.tapply(df.2[,3:4], list(df.2$state, df.2$county), function(x) {colSums(x)})
You're looking for by
. It uses the INDEX
in the way that you assumed tapply
would, by row.
by(df.1, df.1$state, function(x) colSums(x[,3:5]))
The problem with your use of tapply
is that you were indexing the data.frame
by column. (Because data.frame
is really just a list
of columns.) So, tapply
complained that your index didn't match the length of your data.frame
which is 5.
tapply
works on a vector, for a data.frame you can use by
(which is a wrapper for tapply
, take a look at the code):
> by(df.1[,c(3:5)], df.1$state, FUN=colSums)
df.1$state: AA
apples cherries plums
111 222 333
-------------------------------------------------------------------------------------
df.1$state: BB
apples cherries plums
-111 -222 -333