Subset by multiple conditions

╄→尐↘猪︶ㄣ 提交于 2019-12-01 04:02:40

I think ave could be useful here. I call your original data frame 'df'. For each Id, check if 2009-2011 is present in Year (2009:2011 %in% x). This gives a logical vector, which can be summed. Test if the sum equals 3 (if all Years are present, the sum is 3), which results in a new logical vector, which is used to subset rows of the data frame.

df[ave(df$Year, df$Id, FUN = function(x) sum(2009:2011 %in% x) == 3, ]
#   Id Year V1
# 1  1 2009 33
# 2  1 2010 67
# 3  1 2011 38
# 7  4 2009 47
# 8  4 2010 51
# 9  4 2011 14

Another way of using ave

DF
##   Id Year V1
## 1  1 2009 33
## 2  1 2010 67
## 3  1 2011 38
## 4  2 2009 45
## 5  3 2009 65
## 6  3 2010 74
## 7  4 2009 47
## 8  4 2010 51
## 9  4 2011 14


DF[ave(DF$Year, DF$Id, FUN = function(x) all(2009:2011 %in% x)) == 1, ]
##   Id Year V1
## 1  1 2009 33
## 2  1 2010 67
## 3  1 2011 38
## 7  4 2009 47
## 8  4 2010 51
## 9  4 2011 14
Maciej

This should do the job :)

library(plyr)
ds<-ddply(ds,.(Id),mutate,Nobs=length(Year))
ds[ds$Nobs == 3 & ds$Year %in% 2009:2011,]

I think an approach using ave is reasonable. But there are lots of ways to solve this problem. I show a few other ways using base R. Then in the last 2 examples I'll introduce the package data.table.

Again, just throwing this out there to provide some options to use different aspects of the language.

d1 <- data.frame(ID=c(1,1,1,2,3,3,4,4,4), Year=c(2009,2010,2011, 2009,2009, 2010, 2009, 2010, 2011), V1=c(33, 67, 38, 45, 65, 74, 47, 51, 14))


# long way
use_years <- as.character(2009:2011)
cnts <- table(d1[,c("ID","Year")])[,use_years]
use_id <- rownames(cnts)[rowSums(cnts)==length(use_years)]
d1[d1[,"ID"]%in%use_id,]
# 1  1 2009 33
# 2  1 2010 67
# 3  1 2011 38
# 7  4 2009 47
# 8  4 2010 51
# 9  4 2011 14

# another longish way
ind1 <- d1[,"Year"]%in%2009:2011
d1_ind <- d1[ind1,"ID"]
ind2 <- d1_ind %in% unique(d1_ind)[tabulate(d1_ind)==3]
d1[ind1,][ind2,]
#   ID Year V1
# 1  1 2009 33
# 2  1 2010 67
# 3  1 2011 38
# 7  4 2009 47
# 8  4 2010 51
# 9  4 2011 14

OK, let's try out a couple methods using data.table. One of my favorite packages of all time. Can be a little tricky at first though, so make sure your boots are on tight (Oh, yeah, it's fast!) :)

# medium way
library(data.table)
d2 <- as.data.table(d1)

d2[ID%in%d2[Year%in%2009:2011, list(logic=nrow(.SD)==3),by="ID"][(logic),ID]]
#    ID Year V1
# 1:  1 2009 33
# 2:  1 2010 67
# 3:  1 2011 38
# 4:  4 2009 47
# 5:  4 2010 51
# 6:  4 2011 14


# short way
d2[Year%in%2009:2011][ID%in%unique(ID)[table(ID)==3]]
#    ID Year V1
# 1:  1 2009 33
# 2:  1 2010 67
# 3:  1 2011 38
# 4:  4 2009 47
# 5:  4 2010 51
# 6:  4 2011 14
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!