How to get group-level statistics while preserving the original dataframe?

给你一囗甜甜゛ 提交于 2019-12-20 06:37:06

问题


I have the following dataframe

one <- c('one',NA,NA,NA,NA,'two',NA,NA)
group1 <- c('A','A','A','A','B','B','B','B')
group2 <- c('C','C','C','D','E','E','F','F')

df = data.frame(one, group1,group2)


> df
   one group1 group2
1  one      A      C
2 <NA>      A      C
3 <NA>      A      C
4 <NA>      A      D
5 <NA>      B      E
6  two      B      E
7 <NA>      B      F
8 <NA>      B      F

I want to get the count of non-missing observations of one for each combination of group1 and group2.

In Pandas, I would use groupby(['group1','group2']).transform, but how can I do that in R? The original dataframe is LARGE.

Expected output is:

> df
   one group1 group2 count
1  one      A      C     1
2 <NA>      A      C     1
3 <NA>      A      C     1
4 <NA>      A      D     0
5 <NA>      B      E     1
6  two      B      E     1
7 <NA>      B      F     0
8 <NA>      B      F     0

Many thanks!


回答1:


with data.table:

setDT(df)
df[,count_B:=sum(!is.na(one)),by=c("group1","group2")]

gives:

   one group1 group2 count_B
1: one      A      C       1
2:  NA      A      C       1
3:  NA      A      C       1
4:  NA      A      D       0
5:  NA      B      E       1
6: two      B      E       1
7:  NA      B      F       0
8:  NA      B      F       0

The idea is to sum the true values (1 once converted to integer) where B is not NA while grouping by group1and group2.




回答2:


library(dplyr)

df %>% group_by(group1, group2) %>% mutate(count = sum(!is.na(one)))
Source: local data frame [8 x 4]
Groups: group1, group2 [4]

     one group1 group2 count
  <fctr> <fctr> <fctr> <int>
1    one      A      C     1
2     NA      A      C     1
3     NA      A      C     1
4     NA      A      D     0
5     NA      B      E     1
6    two      B      E     1
7     NA      B      F     0
8     NA      B      F     0



回答3:


Let's not forget that a lot of things can be done in base R, although sometimes not as efficiently as data.table or dplyr:

df$count<-ave(as.integer(df$one),df[,2:3],FUN=function(x) sum(!is.na(x)))
#   one group1 group2 count
#1  one      A      C     1
#2 <NA>      A      C     1
#3 <NA>      A      C     1
#4 <NA>      A      D     0
#5 <NA>      B      E     1
#6  two      B      E     1
#7 <NA>      B      F     0
#8 <NA>      B      F     0


来源:https://stackoverflow.com/questions/39895731/how-to-get-group-level-statistics-while-preserving-the-original-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!