How to obtain the symmetric difference between two DataFrames?

前端 未结 5 1015
借酒劲吻你
借酒劲吻你 2020-12-02 19:18

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and

5条回答
  •  忘掉有多难
    2020-12-02 19:43

    I think it could be more efficient using a left join and then filtering out the nulls.

    df1.join(df2, Seq("some_join_key", "some_other_join_key"),"left")
    .where(col("column_just_present_in_df2").isNull)
    

提交回复
热议问题