How to obtain the symmetric difference between two DataFrames?

前端未结

关注

 5  1015

借酒劲吻你 2020-12-02 19:18

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and

5条回答

忘掉有多难 (楼主)

2020-12-02 19:43
I think it could be more efficient using a left join and then filtering out the nulls.
```
df1.join(df2, Seq("some_join_key", "some_other_join_key"),"left")
.where(col("column_just_present_in_df2").isNull)
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...