Issue in Union with Empty dataframe

拈花ヽ惹草 提交于 2021-01-29 20:00:28

问题


I wanted to append a dataframe to another empty dataframe in a loop and finally write to a Location.

My Code -

val myMap = Map(1001 -> "rollNo='12'",1002 -> "rollNo='13'")
val myHiveTableData = spark.table(<table_name>)
val allOtherIngestedData = spark.createDataFrame(sc.emptyRDD[Row],rawDataHiveDf.schema)
myMap.keys.foreach {
                    i => val filteredDataDf = myHiveTableData.where(myMap(i))
                         val othersDf = myHiveTableData.except(filteredDataDf)
                         allOtherIngestedData.union(othersDf)
                         filteredDataDf.write.format("parquer")................... //Writing to a Location in Parquet 
}

allOtherIngestedData.write. ..................... //Writing to a Location in Parquet 

But there is no data in data in allOtherIngestedData.

If i do allOtherIngestedData.count it gives me -> Long = 0.

So How to Append to an empty Dataframe ?

The same can be observed here too -

val rawDataHiveDf = spark.table(allInputs("inputHiveTableName"))
val allOthersDf : DataFrame = spark.createDataFrame(sc.emptyRDD[Row],rawDataHiveDf.schema)
allOthersDf.union(rawDataHiveDf)
allOthersDf.count

O/p -

rawDataHiveDf: org.apache.spark.sql.DataFrame = [eventclassversion: string, serialnumber: string ... 33 more fields]
allOthersDf: org.apache.spark.sql.DataFrame = [eventclassversion: string, serialnumber: string ... 33 more fields]
res46: Long = 0

Scala Version = 2.11

Apache Spark = 2.4.3


回答1:


Working well on sample df.

val df = spark.range(2).withColumn("name", lit("foo"))
    df.show(false)
    df.printSchema()
    /**
      * +---+----+
      * |id |name|
      * +---+----+
      * |0  |foo |
      * |1  |foo |
      * +---+----+
      *
      * root
      * |-- id: long (nullable = false)
      * |-- name: string (nullable = false)
      */
    val emptyDF = spark.createDataFrame(spark.sparkContext.emptyRDD[Row],df.schema)
    emptyDF.show(false)

    /**
      * +---+----+
      * |id |name|
      * +---+----+
      * +---+----+
      */

    emptyDF.unionByName(df)
      .show(false)
    /**
      * +---+----+
      * |id |name|
      * +---+----+
      * |0  |foo |
      * |1  |foo |
      * +---+----+
      */


来源:https://stackoverflow.com/questions/63057336/issue-in-union-with-empty-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!