unable to view data of hive tables after update in spark

 ̄綄美尐妖づ 提交于 2019-12-24 13:54:03

问题


Case: I have a table HiveTest which is a ORC table and transaction set true and loaded in spark shell and viewed data

var rdd= objHiveContext.sql("select * from HiveTest")
rdd.show()

--- Able to view data

Now I went to my hive shell or ambari updated the table , example

hive> update HiveTest set name='test'   ---Done and success
hive> select * from HiveTest -- able to view updated data

Now when I can come back to spark and run I cannot view any data except column names

scala>var rdd1= objHiveContext.sql("select * from HiveTest")
scala> rdd1.show()

--This time only columns are printed , data is not coming

Issue 2: Unable to update from spark sql when I run scal>objHiveContext.sql("update HiveTest set name='test'") getting below error

org.apache.spark.sql.AnalysisException:
Unsupported language features in query: INSERT INTO HiveTest values(1,'sudhir','Software',1,'IT')
TOK_QUERY 0, 0,17, 0
  TOK_FROM 0, -1,17, 0
    TOK_VIRTUAL_TABLE 0, -1,17, 0
      TOK_VIRTUAL_TABREF 0, -1,-1, 0
        TOK_ANONYMOUS 0, -1,-1, 0
      TOK_VALUES_TABLE 1, 6,17, 28
        TOK_VALUE_ROW 1, 7,17, 28
          1 1, 8,8, 28
          'sudhir' 1, 10,10, 30
          'Software' 1, 12,12, 39
          1 1, 14,14, 50
          'IT' 1, 16,16, 52
  TOK_INSERT 1, 0,-1, 12
    TOK_INSERT_INTO 1, 0,4, 12
      TOK_TAB 1, 4,4, 12
        TOK_TABNAME 1, 4,4, 12
          HiveTest 1, 4,4, 12
    TOK_SELECT 0, -1,-1, 0
      TOK_SELEXPR 0, -1,-1, 0
        TOK_ALLCOLREF 0, -1,-1, 0

scala.NotImplementedError: No parse rules for:
 TOK_VIRTUAL_TABLE 0, -1,17, 0
  TOK_VIRTUAL_TABREF 0, -1,-1, 0
    TOK_ANONYMOUS 0, -1,-1, 0
  TOK_VALUES_TABLE 1, 6,17, 28
    TOK_VALUE_ROW 1, 7,17, 28
      1 1, 8,8, 28
      'sudhir' 1, 10,10, 30
      'Software' 1, 12,12, 39
      1 1, 14,14, 50
      'IT' 1, 16,16, 52

org.apache.spark.sql.hive.HiveQl$.nodeToRelation(HiveQl.scala:1235)

This error is for Insert into statement same sort of error for update statement also.


回答1:


Have you tried objHiveContext.refreshTable("HiveTest")?

Spark SQL aggressively caches Hive metastore data.

If an update happens outside of Spark SQL, you might experience some unexpected results as Spark SQL's version of the Hive metastore is out of date.

Here's some more info:

http://spark.apache.org/docs/latest/sql-programming-guide.html#metadata-refreshing

http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.hive.HiveContext

The docs mostly mention Parquet, but this likely applies to ORC and other file formats.

With JSON, for example, if you add new files into a directory outside of Spark SQL, you'll need to call hiveContext.refreshTable() within Spark SQL to see the new data.




回答2:


sparksql does not have the update and delete transactions enabled uptil now. however insert still can be done.



来源:https://stackoverflow.com/questions/34661547/unable-to-view-data-of-hive-tables-after-update-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!