Multiple Aggregation in R [duplicate]

╄→尐↘猪︶ㄣ 提交于 2019-12-13 20:12:14

问题


I have three parameters (3 columns)

x <- c(1, 1, 2, 2, 2, 2, 1, 1, 2) 
y <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) 

and

 z <- c(10, NA, 16, 25, 41, NA, 17, 53, 26)

I need for each y calculate the mean of column z, where x==1

How can I do it using the aggregate function in R?

data <- data.frame(x=c(1, 1, 2, 2, 2, 2, 1, 1, 2), 
                   y=c(1, 1, 1, 2, 2, 2, 3, 3, 3), 
                   z=c(10, NA, 16, 25, 41, NA, 17, 53, 26))

data
  x y  z
1 1 1 10
2 1 1 NA
3 2 1 16
4 2 2 25
5 2 2 41
6 2 2 NA
7 1 3 17
8 1 3 53
9 2 3 26

回答1:


Here's one way of going about it, using tapply:

with(data, tapply(z, list(x==1, y), mean, na.rm=TRUE)['TRUE', ])

#  1  2  3 
# 10 NA 35

More generally, to apply an arbitrary function to groups where x==1, and return NA for groups that don't have x==1, we can use aggregate and merge:

merge(aggregate(z~y, data[data$x==1,], function(x) {
 c(mean=mean(x, na.rm=TRUE), quantile(x, na.rm=TRUE))
}), list(y=unique(data$y)), all=TRUE)

#   y z.mean z.0% z.25% z.50% z.75% z.100%
# 1 1     10   10    10    10    10     10
# 2 2     NA   NA    NA    NA    NA     NA
# 3 3     35   17    26    35    44     53



回答2:


Here is another one liner with aggregate for the sake of golf.

aggregate(z~y, within(data, z <- ifelse(x==1,z,NA)), mean, na.rm=TRUE, na.action=na.pass)

It is suboptimal, and it returns NaN instead of NA for y==2 as does mean(numeric(0)).



来源:https://stackoverflow.com/questions/24215817/multiple-aggregation-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!