问题
I have three parameters (3 columns)
x <- c(1, 1, 2, 2, 2, 2, 1, 1, 2)
y <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
and
z <- c(10, NA, 16, 25, 41, NA, 17, 53, 26)
I need for each y calculate the mean of column z, where x==1
How can I do it using the aggregate function in R?
data <- data.frame(x=c(1, 1, 2, 2, 2, 2, 1, 1, 2),
y=c(1, 1, 1, 2, 2, 2, 3, 3, 3),
z=c(10, NA, 16, 25, 41, NA, 17, 53, 26))
data
x y z
1 1 1 10
2 1 1 NA
3 2 1 16
4 2 2 25
5 2 2 41
6 2 2 NA
7 1 3 17
8 1 3 53
9 2 3 26
回答1:
Here's one way of going about it, using tapply:
with(data, tapply(z, list(x==1, y), mean, na.rm=TRUE)['TRUE', ])
# 1 2 3
# 10 NA 35
More generally, to apply an arbitrary function to groups where x==1, and return NA for groups that don't have x==1, we can use aggregate and merge:
merge(aggregate(z~y, data[data$x==1,], function(x) {
c(mean=mean(x, na.rm=TRUE), quantile(x, na.rm=TRUE))
}), list(y=unique(data$y)), all=TRUE)
# y z.mean z.0% z.25% z.50% z.75% z.100%
# 1 1 10 10 10 10 10 10
# 2 2 NA NA NA NA NA NA
# 3 3 35 17 26 35 44 53
回答2:
Here is another one liner with aggregate for the sake of golf.
aggregate(z~y, within(data, z <- ifelse(x==1,z,NA)), mean, na.rm=TRUE, na.action=na.pass)
It is suboptimal, and it returns NaN instead of NA for y==2 as does mean(numeric(0)).
来源:https://stackoverflow.com/questions/24215817/multiple-aggregation-in-r