SaveAsTable in Spark Scala: HDP3.x

问题

I have one dataframe in Spark I'm saving it in my hive as a table.But getting below error message.

    java.lang.RuntimeException:
    com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector
    does not allow create table as select.at scala.sys.package$.error(package.scala:27)

can anyone please help me how should i save this as table in hive.

    val df3 = df1.join(df2, df1("inv_num") === df2("inv_num")  // Join both dataframes on id column
    ).withColumn("finalSalary", when(df1("salary") < df2("salary"), df2("salary") - df1("salary")) 
    .otherwise(
    when(df1("salary") > df2("salary"), df1("salary") + df2("salary"))  // 5000+3000=8000  check
    .otherwise(df2("salary"))))    // insert from second dataframe
    .drop(df1("salary"))
    .drop(df2("salary"))
    .withColumnRenamed("finalSalary","salary")

    }
    }

    //below code is not working when I'm executing below command its throwing error as 

    java.lang.RuntimeException:
    com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector
    does not allow create table as select.at scala.sys.package$.error(package.scala:27)

     df3.write.
     format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
    .option("database",  "dbname")
    .option("table", "tablename")
    .mode("Append")
    .saveAsTable("tablename")

Note: Table is already available in database and I m using HDP 3.x.

回答1:

According to the spark documentation the behaviour of the saveAsTable function changes with the mode used, by default is ErrofIfExist. In your case, that you are using Hive, try with insertInto, but keep in mind that the order of the columns of the dataframe must be the same as the destiny.

回答2:

Try registerTempTable and then -> spark.sql() -> then write

df3.registerTempTable("tablename");
spark.sql("SELECT salary FROM tablename")
.write.format(HIVE_WAREHOUSE_CONNECTOR)
.option("database",  "dbname")
    .option("table", "tablename")
    .mode("Append")
.option("table", "newTable")
.save()

回答3:

See if below solution works for you,

val df3 = df1.join(df2, df1("inv_num") === df2("inv_num")  // Join both dataframes on id column
    ).withColumn("finalSalary", when(df1("salary") < df2("salary"), df2("salary") - df1("salary")) 
    .otherwise(
    when(df1("salary") > df2("salary"), df1("salary") + df2("salary"))  // 5000+3000=8000  check
    .otherwise(df2("salary"))))    // insert from second dataframe
    .drop(df1("salary"))
    .drop(df2("salary"))
    .withColumnRenamed("finalSalary","salary")

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

df3.createOrReplaceTempView("<temp-tbl-name>")
hive.setDatabase("<db-name>")
hive.createTable("<tbl-name>")
.ifNotExists()

sql("SELECT salary FROM <temp-tbl-name>")
.write
.format(HIVE_WAREHOUSE_CONNECTOR)
.mode("append")
.option("table", "<tbl-name>")
.save()

来源：https://stackoverflow.com/questions/61819955/saveastable-in-spark-scala-hdp3-x

标签

scala

apache-spark

Hive

apache-spark-sql

hdp