问题
Does anybody know how to aggregate by NA in R.
If you take the example below
a <- matrix(1,5,2)
a[1:2,2] <- NA
a[3:5,2] <- 2
aggregate(a[,1], by=list(a[,2]), sum)
The output is:
Group.1 x
2 3
But is there a way to get the output to include NAs in the output like this:
Group.1 x
2 3
NA 2
Thanks
回答1:
Instead of aggregate(), you may want to consider rowsum(). It is actually designed for this exact operation on matrices and is known to be much faster than aggregate(). We can add NA to the factor levels of a[, 2] with addNA(). This will assure that NA shows up as a grouping variable.
rowsum(a[, 1], addNA(a[, 2]))
# [,1]
# 2 3
# <NA> 2
If you still want to use aggregate(), you can incorporate addNA() as well.
aggregate(a[, 1], list(Group = addNA(a[, 2])), sum)
# Group x
# 1 2 3
# 2 <NA> 2
And one more option with data.table -
library(data.table)
as.data.table(a)[, .(x = sum(V1)), by = .(Group = V2)]
# Group x
# 1: NA 2
# 2: 2 3
回答2:
Use summarize from dplyr
library(dplyr)
a %>%
as.data.frame %>%
group_by(V2) %>%
summarize(V1_sum = sum(V1))
回答3:
Using
sqldf:
a <- as.data.frame(a)
sqldf("SELECT V2 [Group], SUM(V1) x
FROM a
GROUP BY V2")
Output:
Group x
1 NA 2
2 2 3
stats package
A variation of AdamO's proposal:
data.frame(xtabs( V1 ~ V2 , data = a,na.action = na.pass, exclude = NULL))
Output:
V2 Freq
1 2 3
2 <NA> 2
回答4:
You can also try aggregating by is.na(a[,2]) instead.
aggregate(a[,1], by=list(is.na(a[,2])), sum)
# Group.1 x
# 1 FALSE 3
# 2 TRUE 2
If you want a finer distinction than just NA or not, then you may want to define a new variable that uses an previously unused value to denote NA (a factor would be more elegant, but a numeric vector is the simplest):
b <- a[,2]
b[is.na(b)] <- 999
aggregate(a[,1], by=list(b), sum)
# Group.1 x
# 1 2 3
# 2 999 2
回答5:
The addNA solution of Rich doesn't require any substantial change to the aggregate syntax, so I think it's the best solution. I'll point out that another option, which produces output similar to table (and thus can be coerced into a data.frame structure similar to that of aggregate) is xtabs.
xtabs(a[, 1] ~ a[, 2], addNA=T)
Gives:
Group.1 x
1 2 3
2 <NA> 2
Another "trick" I see is assigning a missing code to these data. We all like the NA output of R, but assigning a missing code to a grouping variable is a good coding exercise. We take it so that it has one more digit than the largest value in the dataset and is of the form -999...99.
codemiss <- function(x) -10^(floor(log(max(abs(x), na.rm=T), base=10))+2)-1
works in general.
Then you get
a[, 2][is.na(a[, 2])] <- codemiss(a[, 2])
And:
aggregate(a[, 1], list(a[, 2]), sum)
Gives you:
Group.1 x
1 -99 2
2 2 3
来源:https://stackoverflow.com/questions/32214141/aggregate-by-na-in-r