How to update a pyspark dataframe with new values from another dataframe?

前端 未结 3 1474
迷失自我
迷失自我 2020-12-19 16:04

I have two spark dataframes:

Dataframe A:

|col_1 | col_2 | ... | col_n |
|val_1 | val_2 | ... | val_n |

and dataframe B:

         


        
3条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-19 16:49

    If you want to keep only unique values, and require strictly correct results, then union followed by dropDupilcates should do the trick:

    columns_which_dont_change = [...]
    old_df.union(new_df).dropDuplicates(subset=columns_which_dont_change)
    

提交回复
热议问题