How to LEFT ANTI join under some matching condition

别等时光非礼了梦想. 提交于 2020-02-01 18:36:17

问题


I have two tables - one is a core data with a pair of IDs (PC1 and P2) and some blob data (P3). The other is a blacklist data for PC1 in the former table. I will call the first table in_df and the second blacklist_df.

What I want to do is to remove rows from in_df long as in_df.PC1 == blacklist_df.P1 and in_df.P2 == black_list_df.B1. Here is a code snippet to show what I want to achieve more explicitly.

in_df = sqlContext.createDataFrame([[1,2,'A'],[2,1,'B'],[3,1,'C'], 
[4,11,'D'],[1,3,'D']],['PC1','P2','P3'])
in_df.show()

+---+---+---+
|PC1| P2| P3|
+---+---+---+
|  1|  2|  A|
|  2|  1|  B|
|  3|  1|  C|
|  4| 11|  D|
|  1|  3|  D|
+---+---+---+

blacklist_df = sqlContext.createDataFrame([[1,2],[2,1]],['P1','B1'])
blacklist_df.show()

+---+---+
| P1| B1|
+---+---+
|  1|  2|
|  2|  1|
+---+---+

In the end what I want to get is the followings:

+---+--+--+
|PC1|P2|P3|
+---+--+--+
|  1| 3| D|
|  3| 1| C|
|  4|11| D|
+---+--+--+

I tried LEFT_ANTI join but I haven't been successful. Thanks!


回答1:


Pass the join conditions as a list to the join function, and specify how='left_anti' as the join type:

in_df.join(
    blacklist_df, 
    [in_df.PC1 == blacklist_df.P1, in_df.P2 == blacklist_df.B1], 
    how='left_anti'
).show()

+---+---+---+
|PC1| P2| P3|
+---+---+---+
|  1|  3|  D|
|  4| 11|  D|
|  3|  1|  C|
+---+---+---+


来源:https://stackoverflow.com/questions/51343937/how-to-left-anti-join-under-some-matching-condition

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!