unable to view data of hive tables after update in spark

问题

Case: I have a table HiveTest which is a ORC table and transaction set true and loaded in spark shell and viewed data

var rdd= objHiveContext.sql("select * from HiveTest")
rdd.show()

--- Able to view data

Now I went to my hive shell or ambari updated the table , example

hive> update HiveTest set name='test'   ---Done and success
hive> select * from HiveTest -- able to view updated data

Now when I can come back to spark and run I cannot view any data except column names

scala>var rdd1= objHiveContext.sql("select * from HiveTest")
scala> rdd1.show()

--This time only columns are printed , data is not coming

Issue 2: Unable to update from spark sql when I run scal>objHiveContext.sql("update HiveTest set name='test'") getting below error

org.apache.spark.sql.AnalysisException:
Unsupported language features in query: INSERT INTO HiveTest values(1,'sudhir','Software',1,'IT')
TOK_QUERY 0, 0,17, 0
  TOK_FROM 0, -1,17, 0
    TOK_VIRTUAL_TABLE 0, -1,17, 0
      TOK_VIRTUAL_TABREF 0, -1,-1, 0
        TOK_ANONYMOUS 0, -1,-1, 0
      TOK_VALUES_TABLE 1, 6,17, 28
        TOK_VALUE_ROW 1, 7,17, 28
          1 1, 8,8, 28
          'sudhir' 1, 10,10, 30
          'Software' 1, 12,12, 39
          1 1, 14,14, 50
          'IT' 1, 16,16, 52
  TOK_INSERT 1, 0,-1, 12
    TOK_INSERT_INTO 1, 0,4, 12
      TOK_TAB 1, 4,4, 12
        TOK_TABNAME 1, 4,4, 12
          HiveTest 1, 4,4, 12
    TOK_SELECT 0, -1,-1, 0
      TOK_SELEXPR 0, -1,-1, 0
        TOK_ALLCOLREF 0, -1,-1, 0

scala.NotImplementedError: No parse rules for:
 TOK_VIRTUAL_TABLE 0, -1,17, 0
  TOK_VIRTUAL_TABREF 0, -1,-1, 0
    TOK_ANONYMOUS 0, -1,-1, 0
  TOK_VALUES_TABLE 1, 6,17, 28
    TOK_VALUE_ROW 1, 7,17, 28
      1 1, 8,8, 28
      'sudhir' 1, 10,10, 30
      'Software' 1, 12,12, 39
      1 1, 14,14, 50
      'IT' 1, 16,16, 52

org.apache.spark.sql.hive.HiveQl$.nodeToRelation(HiveQl.scala:1235)

This error is for Insert into statement same sort of error for update statement also.

回答1:

Have you tried objHiveContext.refreshTable("HiveTest")?

Spark SQL aggressively caches Hive metastore data.

If an update happens outside of Spark SQL, you might experience some unexpected results as Spark SQL's version of the Hive metastore is out of date.

Here's some more info:

http://spark.apache.org/docs/latest/sql-programming-guide.html#metadata-refreshing

http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.hive.HiveContext

The docs mostly mention Parquet, but this likely applies to ORC and other file formats.

With JSON, for example, if you add new files into a directory outside of Spark SQL, you'll need to call hiveContext.refreshTable() within Spark SQL to see the new data.

回答2:

sparksql does not have the update and delete transactions enabled uptil now. however insert still can be done.

来源：https://stackoverflow.com/questions/34661547/unable-to-view-data-of-hive-tables-after-update-in-spark

标签

scala

apache-spark

Hive

hivecontext

spark-hive