data.table sum and subset

隐身守侯 提交于 2021-01-18 05:29:25

问题


I have a data.table that I am wanting to aggregate

library(data.table)
dt1 <- data.table(year=c("2001","2001","2001","2002","2002","2002","2002"),
                  group=c("a","a","b","a","a","b","b"), 
                  amt=c(20,40,20,35,30,28,19))

I am wanting to sum the amt by year and group and then filter where the summed amt for any given group is greater than 100.

I've got the data.table sum nailed.

dt1[, sum(amt),by=list(year,group)]

   year group V1
1: 2001     a 60
2: 2001     b 20
3: 2002     a 65
4: 2002     b 47

I am having trouble with my final level of filtering.

The end outcome I am looking for is:

   year group V1
1: 2001     a 60
2: 2002     a 65

As a) 60 + 65 > 100 whereas b) 20 + 47 <= 100

Any thoughts on how to achieve this would be great.

I had a look at this data.table sum by group and return row with max value and was wondering whether or not their is an equally eloquent solution to my problem.


回答1:


Single liner in data.table:

dt1[, lapply(.SD,sum), by=.(year,group)][, if (sum(amt) > 100) .SD, by=group]

#   group year amt
#1:     a 2001  60
#2:     a 2002  65



回答2:


You can do:

library(dplyr)
dt1 %>% 
  group_by(group, year) %>% 
  summarise(amt = sum(amt)) %>%
  filter(sum(amt) > 100)

Which gives:

#Source: local data table [2 x 3]
#Groups: group
#
#  year group amt
#1 2001     a  60
#2 2002     a  65



回答3:


This might not be an idea solution, but I would do that in several steps as follows:

dt2=dt1[, sum(amt),by=list(year,group)]
dt3=dt1[, sum(amt)>100,by=list(group)]
dt_result=dt2[group %in% dt3[V1==TRUE]$group,]



回答4:


Here's a two-liner. Find the subset of groups you want first

big_groups <- dt1[,sum(amt),by=group][V1>100]$group
dt1[group%in%big_groups,sum(amt),by=list(year,group)]


来源:https://stackoverflow.com/questions/30180590/data-table-sum-and-subset

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!