How to obtain the symmetric difference between two DataFrames?

前端 未结 5 1014
借酒劲吻你
借酒劲吻你 2020-12-02 19:18

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and

5条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-02 19:39

    You can always rewrite it as:

    df1.unionAll(df2).except(df1.intersect(df2))
    

    Seriously though this UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. I am not aware of any system which provides XOR like operation out of the box. Most likely because it is trivial to implement using other three and there is not much to optimize there.

提交回复
热议问题