keeping a row that meets one criterion and the row above it if it meets another

问题

I have a data set similar to, but much longer and complex than, the following:

df<-data.frame(ID = c(1,1,2,2,3,3,3), 
               week = c(20,21,10,15,20,21,22), 
               var1 = c(0,1,0,1,0,0,1))

  ID week var1
1  1   20    0
2  1   21    1
3  2   10    0
4  2   15    1
5  3   20    0
6  3   21    0
7  3   22    1

I would like to create a new data frame that keeps all rows where var1=1 and keeps the previous row if the ID is the same and the week is exactly one less than the included row. The new data frame would look like this:

  ID week var1
1  1   20    0
2  1   21    1
3  2   15    1
4  3   21    0
5  3   22    1

I have tried to subset

df1<-df[which(df$var1 == 1) - 1, ]

but that gives me the previous row whether it meets my criteria or not.

I have also tried lag in dplyr

df2<-filter(df, var1==1 & lag(week)==week-1)

but that gives me only lines that meet both criteria. All of the code that I have searched results in one or the other of these results.

回答1:

Using SQL we have:

library(sqldf)

sqldf("select b.* from df a join df b on a.ID = b.ID and b.week = a.week - 1
       where a.var1 = 1
       union
       select * from df 
       where var1 = 1
       order by ID, week")

giving

  ID week var1
1  1   20    0
2  1   21    1
3  2   15    1
4  3   21    0
5  3   22    1

回答2:

You can deal with each condition successively:

For your data frame:

df<-data.frame(ID = c(1,1,2,2,3,3,3), 
               week = c(20,21,10,15,20,21,22), 
               var1 = c(0,1,0,1,0,0,1))

You want the following selected

#   ID week var1
# 1  1   20    0 # <- condition 2 + condition 3
# 2  1   21    1 # <- condition 1
# 3  2   10    0 # <- condition 2
# 4  2   15    1 # <- condition 1
# 5  3   20    0 #
# 6  3   21    0 # <- condition 2 + condition 3
# 7  3   22    1 # <- condition 1

And select only the rows with condition 1 and condition 2+3:

## Condition 1: Selecting the rows with var1 = 1
rows_var1 <- which(df$var1 == 1)
rows_var1
# [1] 2 4 7

## Condition 2: Selecting all the previous rows with the same ID
same_ID <- (rows_var1 - 1)[(df$ID[rows_var1] == df$ID[rows_var1 - 1])]
same_ID
# [1] 1 3 6

## Condition 3: Selecting the same IDs with that equal to week-1
same_ID_week <- same_ID[df$week[same_ID] == (df$week[rows_var1] - 1)]
same_ID_week
# [1] 1 6

## Getting the table subset
df1 <- df[sort(c(rows_var1, same_ID_week)),]

#   ID week var1
# 1  1   20    0
# 2  1   21    1
# 3  2   15    1
# 4  3   21    0
# 5  3   22    1

来源：https://stackoverflow.com/questions/52397798/keeping-a-row-that-meets-one-criterion-and-the-row-above-it-if-it-meets-another

标签

subset