Question: in pandas when dropping duplicates you can specify which columns to keep. Is there an equivalent in Spark Dataframes?
Pandas:
df.sort_value
I did the following:
dataframe.groupBy("uniqueColumn").min("time")
This will group by the given column, and within the same group choose the one with min time (this will keep the first and remove others)