dplyr filtering on multiple columns using “%in%”

徘徊边缘 提交于 2021-02-17 06:38:14

问题


I have a dataframe (df1) with multiple columns (ID, Number, Location, Field, Weight). I also have another dataframe (df2) with more information (ID, PassRate, Number, Weight).

I am trying to use dplyr and %in% to filter out rows in df1 that have the same two values as df2.

So far I have:

df_sub <- subset(df1, df1$ID %in% df2$ID & df1$Weight %in% df2$Weight) 

But this is only subsetting on the first condition...any idea why?


回答1:


From the question and sample code, it is unclear whether you want df_sub to contain the rows in df1 which do have matches in df2, or the ones without matches. dplyr::semi_join() will return the rows with matches, dplyr::anti_join() will return the rows without matches.

df_sub <- semi_join(x=df1, y=df2, by=c("ID","Weight")) 

or

df_sub <- anti_join(x=df1, y=df2, by=c("ID","Weight")) 



回答2:


Try this,

df1[paste0(df1$ID, df1$Weight) %in% paste0(df2$ID, df2$Weight), ]

what you are doing is filter the df1 by df2 value , not find the row match

Try this sample data

df1 
ID  Weight
1   a
2   b


df2 
ID  Weight
1   b
2   a

Using your function

 df_sub <- subset(df1, df1$ID %in% df2$ID & df1$Weight %in% df2$Weight)


> df_sub
  ID Weight
1  2      b
2  1      a

Actually , it give back the Boolean like below which cause all df1 value show up on df2 :

 True  True
 True  True

using mine, the result is no one match :

 df1[paste0(df1$ID, df1$Weight) %in% paste0(df2$ID, df2$Weight), ]

[1] ID     Weight
<0 rows> (or 0-length row.names)


来源:https://stackoverflow.com/questions/45623451/dplyr-filtering-on-multiple-columns-using-in

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!