Aggregate by NA in R

心不动则不痛 提交于 2019-12-30 09:43:39

问题


Does anybody know how to aggregate by NA in R.

If you take the example below

a <- matrix(1,5,2)
a[1:2,2] <- NA
a[3:5,2] <- 2
aggregate(a[,1], by=list(a[,2]), sum)

The output is:

Group.1 x
2       3

But is there a way to get the output to include NAs in the output like this:

Group.1 x
2       3
NA      2

Thanks


回答1:


Instead of aggregate(), you may want to consider rowsum(). It is actually designed for this exact operation on matrices and is known to be much faster than aggregate(). We can add NA to the factor levels of a[, 2] with addNA(). This will assure that NA shows up as a grouping variable.

rowsum(a[, 1], addNA(a[, 2]))
#      [,1]
# 2       3
# <NA>    2

If you still want to use aggregate(), you can incorporate addNA() as well.

aggregate(a[, 1], list(Group = addNA(a[, 2])), sum)
#   Group x
# 1     2 3
# 2  <NA> 2

And one more option with data.table -

library(data.table)
as.data.table(a)[, .(x = sum(V1)), by = .(Group = V2)]
#    Group x
# 1:    NA 2
# 2:     2 3



回答2:


Use summarize from dplyr

library(dplyr)

a %>%
  as.data.frame %>%
  group_by(V2) %>%
  summarize(V1_sum = sum(V1))



回答3:


Using sqldf:

a <- as.data.frame(a)
sqldf("SELECT V2 [Group], SUM(V1) x 
      FROM a 
      GROUP BY V2")

Output:

  Group x
1    NA 2
2     2 3

stats package

A variation of AdamO's proposal:

data.frame(xtabs( V1 ~ V2 , data = a,na.action = na.pass, exclude = NULL))

Output:

    V2 Freq
1    2    3
2 <NA>    2



回答4:


You can also try aggregating by is.na(a[,2]) instead.

aggregate(a[,1], by=list(is.na(a[,2])), sum)

#   Group.1 x
# 1   FALSE 3
# 2    TRUE 2

If you want a finer distinction than just NA or not, then you may want to define a new variable that uses an previously unused value to denote NA (a factor would be more elegant, but a numeric vector is the simplest):

b <- a[,2]
b[is.na(b)] <- 999
aggregate(a[,1], by=list(b), sum)

#   Group.1 x
# 1       2 3
# 2     999 2



回答5:


The addNA solution of Rich doesn't require any substantial change to the aggregate syntax, so I think it's the best solution. I'll point out that another option, which produces output similar to table (and thus can be coerced into a data.frame structure similar to that of aggregate) is xtabs.

xtabs(a[, 1] ~ a[, 2], addNA=T)

Gives:

  Group.1 x
1       2 3
2    <NA> 2

Another "trick" I see is assigning a missing code to these data. We all like the NA output of R, but assigning a missing code to a grouping variable is a good coding exercise. We take it so that it has one more digit than the largest value in the dataset and is of the form -999...99.

codemiss <- function(x) -10^(floor(log(max(abs(x), na.rm=T), base=10))+2)-1

works in general.

Then you get

a[, 2][is.na(a[, 2])] <- codemiss(a[, 2])

And:

aggregate(a[, 1], list(a[, 2]), sum)

Gives you:

  Group.1 x
1     -99 2
2       2 3


来源:https://stackoverflow.com/questions/32214141/aggregate-by-na-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!