Extract id with matching pattern on several rows in dataframe

问题

Here is an example of a dataframe I'm working on :

id  string
1    no
1    yes
1    yes
2    no
2    yes
3    yes
3    yes
3    no

I want to extract the id for which the last two rows contain the string "yes" for the column string.

So the results would be :

id   string
 1    yes
 1    yes

And I would have only one id which will be 1.

I tried to do this with a for loop but since I have more than 200 000 lines, the loop is taking too much time : more than 5 minutes.

I tried this :

vec_id <- unique(df$id)

for(id in vec_id){
   if( tail(df[which(df$id == id),"string"])[1] & tail(df[which(df$id == id),"string"])[2] ){
      vec_id <- append(vec_id, id) 
     }

Are there any functions or ways to do this task more fastly ?

回答1:

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', if all the 'string' from the last two observations are 'yes' then get the last two 'string' (using tail).

library(data.table)
setDT(df1)[, if(all(tail(string,2)=="yes")) .(string = tail(string,2)) , id]
#  id string
#1:  1    yes
#2:  1    yes

NOTE: The data.table syntax is often data.table[i, j, by].

回答2:

A base R alternative is to use split and lapply (with unlist) to construct a logical vector that can be used to perform the row subsetting:

dropper <- unlist(lapply(split(df$string, df$id),
                         FUN=function(i) c(rep(FALSE, length(i) - 2),
                                           rep(all(tail(i, 2) =="yes"), 2))),
                  use.names=FALSE)
dropper
FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

Here, split splits the df$string into a list by df$id which is fed to an anonymous function by lapply. The function returns FALSE for the first n-2 elements and then either returns TRUE TRUE or FALSE FALSE for the final two elements depending on whether they are both "yes."

then use the vector to drop unwanted observations.

 df[dropper,]
  id string
2  1    yes
3  1    yes

来源：https://stackoverflow.com/questions/42860953/extract-id-with-matching-pattern-on-several-rows-in-dataframe

标签

string

dataframe

match