Concatenate two PySpark dataframes

后端 未结 10 1383
独厮守ぢ
独厮守ぢ 2020-12-02 16:28

I\'m trying to concatenate two PySpark dataframes with some columns that are only on each of them:

from pyspark.sql.functions import randn, rand

df_1 = sqlC         


        
10条回答
  •  北荒
    北荒 (楼主)
    2020-12-02 17:13

    im a dwh turned pyspark developer. Below is what I would do:

        from pyspark.sql import SparkSession
        df_1.createOrReplaceTempView("tab_1")
        df_2.createOrReplaceTempView("tab_2")
        df_concat=spark.sql("select tab_1.id,tab_1.uniform,tab_1.normal,tab_2.normal_2  from tab_1 tab_1 left join tab_2 tab_2 on tab_1.uniform=tab_2.uniform\
                    union\
                    select tab_2.id,tab_2.uniform,tab_1.normal,tab_2.normal_2  from tab_2 tab_2 left join tab_1 tab_1 on tab_1.uniform=tab_2.uniform")
        df_concat.show()
    

    --pls let me know if this worked for you or was your need.

提交回复
热议问题