Spark unionAll multiple dataframes

前端 未结 3 2048

For a set of dataframes

val df1 = sc.parallelize(1 to 4).map(i => (i,i*10)).toDF(\"id\",\"x\")
val df2 = sc.parallelize(1 to 4).map(i => (i,i*100)).toD         


        
3条回答
  •  感动是毒
    2020-11-27 17:58

    For pyspark you can do the following:

    from functools import reduce
    from pyspark.sql import DataFrame
    
    dfs = [df1,df2,df3]
    df = reduce(DataFrame.unionAll, dfs)
    

    It's also worth nothing that the order of the columns in the dataframes should be the same for this to work. This can silently give unexpected results if you don't have the correct column orders!!

    If you are using pyspark 2.3 or greater, you can use unionByName so you don't have to reorder the columns.

提交回复
热议问题