问题
I have one dataframe in Spark I'm saving it in my hive as a table.But getting below error message.
java.lang.RuntimeException:
com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector
does not allow create table as select.at scala.sys.package$.error(package.scala:27)
can anyone please help me how should i save this as table in hive.
val df3 = df1.join(df2, df1("inv_num") === df2("inv_num") // Join both dataframes on id column
).withColumn("finalSalary", when(df1("salary") < df2("salary"), df2("salary") - df1("salary"))
.otherwise(
when(df1("salary") > df2("salary"), df1("salary") + df2("salary")) // 5000+3000=8000 check
.otherwise(df2("salary")))) // insert from second dataframe
.drop(df1("salary"))
.drop(df2("salary"))
.withColumnRenamed("finalSalary","salary")
}
}
//below code is not working when I'm executing below command its throwing error as
java.lang.RuntimeException:
com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector
does not allow create table as select.at scala.sys.package$.error(package.scala:27)
df3.write.
format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
.option("database", "dbname")
.option("table", "tablename")
.mode("Append")
.saveAsTable("tablename")
Note: Table is already available in database and I m using HDP 3.x.
回答1:
According to the spark documentation the behaviour of the saveAsTable function changes with the mode used, by default is ErrofIfExist.
In your case, that you are using Hive, try with insertInto, but keep in mind that the order of the columns of the dataframe must be the same as the destiny.
回答2:
Try registerTempTable and then -> spark.sql() -> then write
df3.registerTempTable("tablename");
spark.sql("SELECT salary FROM tablename")
.write.format(HIVE_WAREHOUSE_CONNECTOR)
.option("database", "dbname")
.option("table", "tablename")
.mode("Append")
.option("table", "newTable")
.save()
回答3:
See if below solution works for you,
val df3 = df1.join(df2, df1("inv_num") === df2("inv_num") // Join both dataframes on id column
).withColumn("finalSalary", when(df1("salary") < df2("salary"), df2("salary") - df1("salary"))
.otherwise(
when(df1("salary") > df2("salary"), df1("salary") + df2("salary")) // 5000+3000=8000 check
.otherwise(df2("salary")))) // insert from second dataframe
.drop(df1("salary"))
.drop(df2("salary"))
.withColumnRenamed("finalSalary","salary")
val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()
df3.createOrReplaceTempView("<temp-tbl-name>")
hive.setDatabase("<db-name>")
hive.createTable("<tbl-name>")
.ifNotExists()
sql("SELECT salary FROM <temp-tbl-name>")
.write
.format(HIVE_WAREHOUSE_CONNECTOR)
.mode("append")
.option("table", "<tbl-name>")
.save()
来源:https://stackoverflow.com/questions/61819955/saveastable-in-spark-scala-hdp3-x