sum multiple columns by group with tapply

前端未结

关注

 3  1964

I wanted to sum individual columns by group and my first thought was to use tapply. However, I cannot get tapply to work. Can tapply

相关标签:

3条回答

慢半拍i

2020-12-15 23:45

I looked at the source code for by, as EDi suggested. That code was substantially more complex than my change to the one line in tapply. I have now found that my.tapply does not work with the more complex scenario below where apples and cherries are summed by state and county. If I get my.tapply to work with this case I can post the code here later:

df.2 <- read.table(text = '

    state   county   apples   cherries   plums
       AA        1        1          2       3
       AA        1        1          2       3
       AA        2       10         20      30
       AA        2       10         20      30
       AA        3      100        200     300
       AA        3      100        200     300

       BB        7       -1         -2      -3
       BB        7       -1         -2      -3
       BB        8      -10        -20     -30
       BB        8      -10        -20     -30
       BB        9     -100       -200    -300
       BB        9     -100       -200    -300

', header = TRUE, stringsAsFactors = FALSE)

# my function works

   tapply(df.2$apples  , list(df.2$state, df.2$county), function(x) {sum(x)})
my.tapply(df.2$apples  , list(df.2$state, df.2$county), function(x) {sum(x)})

# my function works

   tapply(df.2$cherries, list(df.2$state, df.2$county), function(x) {sum(x)})
my.tapply(df.2$cherries, list(df.2$state, df.2$county), function(x) {sum(x)})

# my function does not work

my.tapply(df.2[,3:4], list(df.2$state, df.2$county), function(x) {colSums(x)})

0 讨论(0)

隐瞒了意图╮

2020-12-15 23:49
You're looking for by. It uses the INDEX in the way that you assumed tapply would, by row.
```
by(df.1, df.1$state, function(x) colSums(x[,3:5]))
```
The problem with your use of tapply is that you were indexing the data.frame by column. (Because data.frame is really just a list of columns.) So, tapply complained that your index didn't match the length of your data.frame which is 5.
0 讨论(0)
发布评论:

提交评论
- 加载中...

悲&欢浪女

2020-12-16 00:06

tapply works on a vector, for a data.frame you can use by (which is a wrapper for tapply, take a look at the code):

> by(df.1[,c(3:5)], df.1$state, FUN=colSums)
df.1$state: AA
  apples cherries    plums 
     111      222      333 
------------------------------------------------------------------------------------- 
df.1$state: BB
  apples cherries    plums 
    -111     -222     -333

0 讨论(0)