dplyr filtering on multiple columns using “%in%”

徘徊边缘 提交于 2021-02-17 06:38:14


I have a dataframe (df1) with multiple columns (ID, Number, Location, Field, Weight). I also have another dataframe (df2) with more information (ID, PassRate, Number, Weight).

I am trying to use dplyr and %in% to filter out rows in df1 that have the same two values as df2.

So far I have:

df_sub <- subset(df1, df1$ID %in% df2$ID & df1$Weight %in% df2$Weight) 

But this is only subsetting on the first condition...any idea why?


From the question and sample code, it is unclear whether you want df_sub to contain the rows in df1 which do have matches in df2, or the ones without matches. dplyr::semi_join() will return the rows with matches, dplyr::anti_join() will return the rows without matches.

df_sub <- semi_join(x=df1, y=df2, by=c("ID","Weight")) 


df_sub <- anti_join(x=df1, y=df2, by=c("ID","Weight")) 


Try this,

df1[paste0(df1$ID, df1$Weight) %in% paste0(df2$ID, df2$Weight), ]

what you are doing is filter the df1 by df2 value , not find the row match

Try this sample data

ID  Weight
1   a
2   b

ID  Weight
1   b
2   a

Using your function

 df_sub <- subset(df1, df1$ID %in% df2$ID & df1$Weight %in% df2$Weight)

> df_sub
  ID Weight
1  2      b
2  1      a

Actually , it give back the Boolean like below which cause all df1 value show up on df2 :

 True  True
 True  True

using mine, the result is no one match :

 df1[paste0(df1$ID, df1$Weight) %in% paste0(df2$ID, df2$Weight), ]

[1] ID     Weight
<0 rows> (or 0-length row.names)

