Extract id with matching pattern on several rows in dataframe

夙愿已清 提交于 2020-01-24 09:39:06

问题


Here is an example of a dataframe I'm working on :

id  string
1    no
1    yes
1    yes
2    no
2    yes
3    yes
3    yes
3    no

I want to extract the id for which the last two rows contain the string "yes" for the column string.

So the results would be :

id   string
 1    yes
 1    yes

And I would have only one id which will be 1.

I tried to do this with a for loop but since I have more than 200 000 lines, the loop is taking too much time : more than 5 minutes.

I tried this :

vec_id <- unique(df$id)

for(id in vec_id){
   if( tail(df[which(df$id == id),"string"])[1] & tail(df[which(df$id == id),"string"])[2] ){
      vec_id <- append(vec_id, id) 
     }

Are there any functions or ways to do this task more fastly ?


回答1:


We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', if all the 'string' from the last two observations are 'yes' then get the last two 'string' (using tail).

library(data.table)
setDT(df1)[, if(all(tail(string,2)=="yes")) .(string = tail(string,2)) , id]
#  id string
#1:  1    yes
#2:  1    yes

NOTE: The data.table syntax is often data.table[i, j, by].




回答2:


A base R alternative is to use split and lapply (with unlist) to construct a logical vector that can be used to perform the row subsetting:

dropper <- unlist(lapply(split(df$string, df$id),
                         FUN=function(i) c(rep(FALSE, length(i) - 2),
                                           rep(all(tail(i, 2) =="yes"), 2))),
                  use.names=FALSE)
dropper
FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

Here, split splits the df$string into a list by df$id which is fed to an anonymous function by lapply. The function returns FALSE for the first n-2 elements and then either returns TRUE TRUE or FALSE FALSE for the final two elements depending on whether they are both "yes."

then use the vector to drop unwanted observations.

 df[dropper,]
  id string
2  1    yes
3  1    yes


来源:https://stackoverflow.com/questions/42860953/extract-id-with-matching-pattern-on-several-rows-in-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!