How to Updata an ORC Hive table form Spark using Scala

感情迁移 提交于 2019-12-23 12:38:34

问题


I would like to update a hive table which is in orc format , I'm able to update from my ambari hive view, but unable to run same update statement from sacla (spark-shell)

objHiveContext.sql("select * from table_name ") able to see data but when I run

objHiveContext.sql("update table_name set column_name='testing' ") unable to run , some Noviable exception(Invalid syntax near update etc) is occurring where as I'm able to update from Ambari view(As I set all the required configurations i.e TBLPROPERTIES "orc.compress"="NONE" transactional true etc)

Tried with Insert into using case statements and all but couldn't Can we UPDATE hive ORC tables from spark? If yes then what is the procedure ?

Imported below

import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._

Note: I didn't apply any partition or bucketing on that table If I apply bucketing I'm even unable to view data when stored as ORC Hive Version:1.2.1 Spark version:1.4.1 Scala Version :2.10.6


回答1:


Have you tried the DataFrame.write API using SaveMode.Append per the link below?

http://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options

use "orc" as the format and "append" as the save mode. examples are in that link above.




回答2:


Answer to sudhir question:-

How to mention DataBase Name while saving?

you can provide the database name before the table name. ex:- if your database name is orc_db and table name is yahoo_orc_table then you can mention the db name before the table name as below:-myData.write.format("orc").mode(SaveMode.Append).saveAsTable("orc_db.yahoo_orc_table")



来源:https://stackoverflow.com/questions/34534610/how-to-updata-an-orc-hive-table-form-spark-using-scala

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!