问题
I have a data set similar to, but much longer and complex than, the following:
df<-data.frame(ID = c(1,1,2,2,3,3,3),
week = c(20,21,10,15,20,21,22),
var1 = c(0,1,0,1,0,0,1))
ID week var1
1 1 20 0
2 1 21 1
3 2 10 0
4 2 15 1
5 3 20 0
6 3 21 0
7 3 22 1
I would like to create a new data frame that keeps all rows where var1=1 and keeps the previous row if the ID is the same and the week is exactly one less than the included row. The new data frame would look like this:
ID week var1
1 1 20 0
2 1 21 1
3 2 15 1
4 3 21 0
5 3 22 1
I have tried to subset
df1<-df[which(df$var1 == 1) - 1, ]
but that gives me the previous row whether it meets my criteria or not.
I have also tried lag in dplyr
df2<-filter(df, var1==1 & lag(week)==week-1)
but that gives me only lines that meet both criteria. All of the code that I have searched results in one or the other of these results.
回答1:
Using SQL we have:
library(sqldf)
sqldf("select b.* from df a join df b on a.ID = b.ID and b.week = a.week - 1
where a.var1 = 1
union
select * from df
where var1 = 1
order by ID, week")
giving
ID week var1
1 1 20 0
2 1 21 1
3 2 15 1
4 3 21 0
5 3 22 1
回答2:
You can deal with each condition successively:
For your data frame:
df<-data.frame(ID = c(1,1,2,2,3,3,3),
week = c(20,21,10,15,20,21,22),
var1 = c(0,1,0,1,0,0,1))
You want the following selected
# ID week var1
# 1 1 20 0 # <- condition 2 + condition 3
# 2 1 21 1 # <- condition 1
# 3 2 10 0 # <- condition 2
# 4 2 15 1 # <- condition 1
# 5 3 20 0 #
# 6 3 21 0 # <- condition 2 + condition 3
# 7 3 22 1 # <- condition 1
And select only the rows with condition 1 and condition 2+3:
## Condition 1: Selecting the rows with var1 = 1
rows_var1 <- which(df$var1 == 1)
rows_var1
# [1] 2 4 7
## Condition 2: Selecting all the previous rows with the same ID
same_ID <- (rows_var1 - 1)[(df$ID[rows_var1] == df$ID[rows_var1 - 1])]
same_ID
# [1] 1 3 6
## Condition 3: Selecting the same IDs with that equal to week-1
same_ID_week <- same_ID[df$week[same_ID] == (df$week[rows_var1] - 1)]
same_ID_week
# [1] 1 6
## Getting the table subset
df1 <- df[sort(c(rows_var1, same_ID_week)),]
# ID week var1
# 1 1 20 0
# 2 1 21 1
# 3 2 15 1
# 4 3 21 0
# 5 3 22 1
来源:https://stackoverflow.com/questions/52397798/keeping-a-row-that-meets-one-criterion-and-the-row-above-it-if-it-meets-another