I have two spark dataframes:
Dataframe A:
|col_1 | col_2 | ... | col_n | |val_1 | val_2 | ... | val_n |
and dataframe B:
If you want to keep only unique values, and require strictly correct results, then union followed by dropDupilcates should do the trick:
union
dropDupilcates
columns_which_dont_change = [...] old_df.union(new_df).dropDuplicates(subset=columns_which_dont_change)