spark dataframe drop duplicates and keep first

后端 未结 5 710
孤街浪徒
孤街浪徒 2020-12-05 02:47

Question: in pandas when dropping duplicates you can specify which columns to keep. Is there an equivalent in Spark Dataframes?

Pandas:

df.sort_value         


        
5条回答
  •  Happy的楠姐
    2020-12-05 03:19

    I did the following:

    dataframe.groupBy("uniqueColumn").min("time")
    

    This will group by the given column, and within the same group choose the one with min time (this will keep the first and remove others)

提交回复
热议问题