hivecontext

HiveException: Failed to create spark client

故事扮演 提交于 2020-01-05 04:38:13
问题 1)I have created a sql file where we are collecting the data from two different hive table and Inserting into a single Hive table, 2) we are invoking this SQL file using shell script 3)Sample Spark Setting: SET hive.execution.engine=spark; SET spark.master=yarn-cluster; SET spark.app.name="ABC_${hiveconf:PRC_DT}_${hiveconf:JOB_ID}"; --SET spark.driver.memory=8g; --SET spark.executor.memory=8g; SET hive.exec.dynamic.partition.mode = nonstrict; SET hive.stats.fetch.column.stats=true; SET hive

Spark SQL sql(“<some aggregate query>”).first().getDouble(0) give me inconsistent results

放肆的年华 提交于 2019-12-24 17:04:30
问题 I have the below query which is supposed to find an average of the column values and return me the result which is a single number. val avgVal = hiveContext.sql("select round(avg(amount), 4) from users.payment where dt between '2018-05-09' and '2018-05-09'").first().getDouble(0) I'm facing inconsistent behavior at this statement. This often fails with below error however it gives non-NULL results when executed through Hive." 18/05/10 11:01:12 ERROR ApplicationMaster: User class threw

unable to view data of hive tables after update in spark

 ̄綄美尐妖づ 提交于 2019-12-24 13:54:03
问题 Case: I have a table HiveTest which is a ORC table and transaction set true and loaded in spark shell and viewed data var rdd= objHiveContext.sql("select * from HiveTest") rdd.show() --- Able to view data Now I went to my hive shell or ambari updated the table , example hive> update HiveTest set name='test' ---Done and success hive> select * from HiveTest -- able to view updated data Now when I can come back to spark and run I cannot view any data except column names scala>var rdd1=

How to Updata an ORC Hive table form Spark using Scala

感情迁移 提交于 2019-12-23 12:38:34
问题 I would like to update a hive table which is in orc format , I'm able to update from my ambari hive view, but unable to run same update statement from sacla (spark-shell) objHiveContext.sql("select * from table_name ") able to see data but when I run objHiveContext.sql("update table_name set column_name='testing' ") unable to run , some Noviable exception(Invalid syntax near update etc) is occurring where as I'm able to update from Ambari view(As I set all the required configurations i.e

Spark HiveContext : Insert Overwrite the same table it is read from

左心房为你撑大大i 提交于 2019-12-23 05:27:10
问题 I want to apply SCD1 and SCD2 using PySpark in HiveContext. In my approach, I am reading incremental data and target table. After reading, I am joining them for upsert approach. I am doing registerTempTable on all the source dataframes. I am trying to write final dataset into target table and I am facing the issue that Insert overwrite is not possible in the table it is read from. Please suggest some solution for this. I do not want to write intermediate data into a physical table and read it

Spark job that use hive context failing in oozie

亡梦爱人 提交于 2019-12-12 02:13:14
问题 In one of our pipelines we are doing aggregation using spark(java) and it is orchestrated using oozie. This pipelines writes the aggregated data to an ORC file using the following lines. HiveContext hc = new HiveContext(sc); DataFrame modifiedFrame = hc.createDataFrame(aggregateddatainrdd, schema); modifiedFrame.write().format("org.apache.spark.sql.hive.orc").partitionBy("partition_column_name").save(output); When the spark action in the oozie job gets triggered it throws the following

Unable to write data on hive using spark

拟墨画扇 提交于 2019-12-11 17:09:41
问题 I am using spark1.6. I am creating hivecontext using spark context. When I save the data into hive it gives error. I am using cloudera vm. My hive is inside cloudera vm and spark in on my system. I can access the vm using IP. I have started the thrift server and hiveserver2 on vm. I have user thrift server uri for hive.metastore.uris val hiveContext = new HiveContext(sc) hiveContext.setConf("hive.metastore.uris", "thrift://IP:9083") ............ ............ df.write.mode(SaveMode.Append)

Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/analysis/OverrideFunctionRegistry

无人久伴 提交于 2019-12-10 18:54:23
问题 I have tried with below code in spark and scala, attaching code and pom.xml package com.Spark.ConnectToHadoop import org.apache.spark.SparkConf import org.apache.spark.SparkConf import org.apache.spark._ import org.apache.spark.sql._ import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.SQLContext import org.apache.spark.rdd.RDD //import groovy.sql.Sql.CreateStatementCommand //import org.apache.spark.SparkConf object CountWords { def main(args:Array[String]){ val objConf =

Hive Tables are created from spark but are not visible in hive

落爺英雄遲暮 提交于 2019-12-08 07:04:56
问题 From spark using: DataFrame.write().mode(SaveMode.Ignore).format("orc").saveAsTable("myTableName") Table is getting saved I can see using below command's hadoop fs -ls /apps/hive/warehouse\test.db' where test is my database name drwxr-xr-x - psudhir hdfs 0 2016-01-04 05:02 /apps/hive/warehouse/test.db/myTableName but when I trying to check tables in Hive I cannot view them either with command SHOW TABLES from hiveContext. 回答1: sudo cp /etc/hive/conf.dist/hive-site.xml /etc/spark/conf/ This

“INSERT INTO …” with SparkSQL HiveContext

拥有回忆 提交于 2019-12-03 14:56:35
问题 I'm trying to run an insert statement with my HiveContext, like this: hiveContext.sql('insert into my_table (id, score) values (1, 10)') The 1.5.2 Spark SQL Documentation doesn't explicitly state whether this is supported or not, although it does support "dynamic partition insertion". This leads to a stack trace like AnalysisException: Unsupported language features in query: insert into my_table (id, score) values (1, 10) TOK_QUERY 0, 0,20, 0 TOK_FROM 0, -1,20, 0 TOK_VIRTUAL_TABLE 0, -1,20, 0