Compare two Spark dataframes

前端 未结 5 1022
再見小時候
再見小時候 2020-12-13 15:38

Spark dataframe 1 -:

+------+-------+---------+----+---+-------+
|city  |product|date     |sale|exp|wastage|
+------+-------+---------+----+---+-------+
|cit         


        
5条回答
  •  失恋的感觉
    2020-12-13 16:14

    I am not sure about finding the deleted and modified records but you can use except function to get the difference

    df2.except(df1)
    

    This returns the rows that has been added or modified in dataframe2 or record with changes. Output:

    +------+-------+---------+----+---+-------+
    |  city|product|     date|sale|exp|wastage|
    +------+-------+---------+----+---+-------+
    |city 3| prod 4|9/18/2017| 230|431|    169|
    |city 1| prod 4|9/27/2017| 350| 90|    190|
    |city 1| prod 3|9/9/2017 | 230|430|    160|
    +------+-------+---------+----+---+-------+
    

    You can also try with join and filter to get the changed and unchanged data as

    df1.join(df2, Seq("city","product", "date"), "left").show(false)
    df1.join(df2, Seq("city","product", "date"), "right").show(false)
    

    Hope this helps!

提交回复
热议问题