I\'m trying to concatenate two PySpark dataframes with some columns that are only on each of them:
from pyspark.sql.functions import randn, rand df_1 = sqlC
To concatenate multiple pyspark dataframes into one:
from functools import reduce reduce(lambda x,y:x.union(y), [df_1,df_2])
And you can replace the list of [df_1, df_2] to a list of any length.