Concatenate two PySpark dataframes

后端 未结 10 1361
独厮守ぢ
独厮守ぢ 2020-12-02 16:28

I\'m trying to concatenate two PySpark dataframes with some columns that are only on each of them:

from pyspark.sql.functions import randn, rand

df_1 = sqlC         


        
10条回答
  •  情歌与酒
    2020-12-02 17:11

    Maybe you can try creating the unexisting columns and calling union (unionAll for Spark 1.6 or lower):

    cols = ['id', 'uniform', 'normal', 'normal_2']    
    
    df_1_new = df_1.withColumn("normal_2", lit(None)).select(cols)
    df_2_new = df_2.withColumn("normal", lit(None)).select(cols)
    
    result = df_1_new.union(df_2_new)
    

提交回复
热议问题