问题
I wanted to append a dataframe to another empty dataframe in a loop and finally write to a Location.
My Code -
val myMap = Map(1001 -> "rollNo='12'",1002 -> "rollNo='13'")
val myHiveTableData = spark.table(<table_name>)
val allOtherIngestedData = spark.createDataFrame(sc.emptyRDD[Row],rawDataHiveDf.schema)
myMap.keys.foreach {
i => val filteredDataDf = myHiveTableData.where(myMap(i))
val othersDf = myHiveTableData.except(filteredDataDf)
allOtherIngestedData.union(othersDf)
filteredDataDf.write.format("parquer")................... //Writing to a Location in Parquet
}
allOtherIngestedData.write. ..................... //Writing to a Location in Parquet
But there is no data in data in allOtherIngestedData.
If i do allOtherIngestedData.count it gives me -> Long = 0.
So How to Append to an empty Dataframe ?
The same can be observed here too -
val rawDataHiveDf = spark.table(allInputs("inputHiveTableName"))
val allOthersDf : DataFrame = spark.createDataFrame(sc.emptyRDD[Row],rawDataHiveDf.schema)
allOthersDf.union(rawDataHiveDf)
allOthersDf.count
O/p -
rawDataHiveDf: org.apache.spark.sql.DataFrame = [eventclassversion: string, serialnumber: string ... 33 more fields]
allOthersDf: org.apache.spark.sql.DataFrame = [eventclassversion: string, serialnumber: string ... 33 more fields]
res46: Long = 0
Scala Version = 2.11
Apache Spark = 2.4.3
回答1:
Working well on sample df.
val df = spark.range(2).withColumn("name", lit("foo"))
df.show(false)
df.printSchema()
/**
* +---+----+
* |id |name|
* +---+----+
* |0 |foo |
* |1 |foo |
* +---+----+
*
* root
* |-- id: long (nullable = false)
* |-- name: string (nullable = false)
*/
val emptyDF = spark.createDataFrame(spark.sparkContext.emptyRDD[Row],df.schema)
emptyDF.show(false)
/**
* +---+----+
* |id |name|
* +---+----+
* +---+----+
*/
emptyDF.unionByName(df)
.show(false)
/**
* +---+----+
* |id |name|
* +---+----+
* |0 |foo |
* |1 |foo |
* +---+----+
*/
来源:https://stackoverflow.com/questions/63057336/issue-in-union-with-empty-dataframe