I\'m trying to concatenate two PySpark dataframes with some columns that are only on each of them:
from pyspark.sql.functions import randn, rand
df_1 = sqlC
im a dwh turned pyspark developer. Below is what I would do:
from pyspark.sql import SparkSession
df_1.createOrReplaceTempView("tab_1")
df_2.createOrReplaceTempView("tab_2")
df_concat=spark.sql("select tab_1.id,tab_1.uniform,tab_1.normal,tab_2.normal_2 from tab_1 tab_1 left join tab_2 tab_2 on tab_1.uniform=tab_2.uniform\
union\
select tab_2.id,tab_2.uniform,tab_1.normal,tab_2.normal_2 from tab_2 tab_2 left join tab_1 tab_1 on tab_1.uniform=tab_2.uniform")
df_concat.show()
--pls let me know if this worked for you or was your need.