spark-hive

unable to view data of hive tables after update in spark

 ̄綄美尐妖づ 提交于 2019-12-24 13:54:03
问题 Case: I have a table HiveTest which is a ORC table and transaction set true and loaded in spark shell and viewed data var rdd= objHiveContext.sql("select * from HiveTest") rdd.show() --- Able to view data Now I went to my hive shell or ambari updated the table , example hive> update HiveTest set name='test' ---Done and success hive> select * from HiveTest -- able to view updated data Now when I can come back to spark and run I cannot view any data except column names scala>var rdd1=

Spark Streaming + Hive

ε祈祈猫儿з 提交于 2019-12-24 00:42:31
问题 We are in a process to build a application that takes data from source system through flume and then with the help of Kafka message system to spark streaming for in memory processing, after processing data into data frame we will put data into hive tables. Flow will be as follows Source System -> Flume -> Kafka -> Spark Streaming -> Hive , Is it correct flow or we need to review it? We are taking Discrete stream and converting it into data frame for SQL compatibility functions. Now we have 14

How to set hive.metastore.warehouse.dir in HiveContext?

前提是你 提交于 2019-12-22 07:49:22
问题 I'm trying to write a unit test case that relies on DataFrame.saveAsTable() (since it is backed by a file system). I point the hive warehouse parameter to a local disk location: sql.sql(s"SET hive.metastore.warehouse.dir=file:///home/myusername/hive/warehouse") By default, Embedded Mode of metastore should be enabled, thus doesn't require an external database. But HiveContext seems to be ignoring this configuration: since I still get this error when calling saveAsTable(): MetaException

Missing hive-site when using spark-submit YARN cluster mode

↘锁芯ラ 提交于 2019-12-17 16:49:14
问题 Using HDP 2.5.3 and I've been trying to debug some YARN container classpath issues. Since HDP includes both Spark 1.6 and 2.0.0, there have been some conflicting versions Users I support are successfully able to use Spark2 with Hive queries in YARN client mode, but not from cluster mode they get errors about tables not found, or something like that because the Metastore connection isn't established. I am guessing that setting either --driver-class-path /etc/spark2/conf:/etc/hive/conf or

Missing hive-site when using spark-submit YARN cluster mode

血红的双手。 提交于 2019-12-17 16:49:10
问题 Using HDP 2.5.3 and I've been trying to debug some YARN container classpath issues. Since HDP includes both Spark 1.6 and 2.0.0, there have been some conflicting versions Users I support are successfully able to use Spark2 with Hive queries in YARN client mode, but not from cluster mode they get errors about tables not found, or something like that because the Metastore connection isn't established. I am guessing that setting either --driver-class-path /etc/spark2/conf:/etc/hive/conf or

spark hive java.lang.linkageerror

不问归期 提交于 2019-12-13 02:18:55
问题 When executing Drop table if exists in Spark HiveContext I'm getting the below error. Hivecontext.sql(Drop table if exists table_name) java.lang.LinkageError: ClassCastException: attempting to castjar:file:/u/applic/data/hdfs7/hadoop/yarn/local/filecache/494/spark-hdp-assembly.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/u/applic/data/hdfs7/hadoop/yarn/local/filecache/494/spark-hdp-assembly.jar!/javax/ws/rs/ext/RuntimeDelegate.class I'm using: Spark Version - Spark 1.6, HDP - 2.4 来源:

How to set hive.metastore.warehouse.dir in HiveContext?

妖精的绣舞 提交于 2019-12-05 09:42:10
I'm trying to write a unit test case that relies on DataFrame.saveAsTable() (since it is backed by a file system). I point the hive warehouse parameter to a local disk location: sql.sql(s"SET hive.metastore.warehouse.dir=file:///home/myusername/hive/warehouse") By default, Embedded Mode of metastore should be enabled, thus doesn't require an external database. But HiveContext seems to be ignoring this configuration: since I still get this error when calling saveAsTable(): MetaException(message:file:/user/hive/warehouse/users is not a directory or unable to create one) org.apache.hadoop.hive.ql

Querying on multiple Hive stores using Apache Spark

一笑奈何 提交于 2019-11-30 02:56:07
I have a spark application which will successfully connect to hive and query on hive tables using spark engine. To build this, I just added hive-site.xml to classpath of the application and spark will read the hive-site.xml to connect to its metastore. This method was suggested in spark's mailing list. So far so good. Now I want to connect to two hive stores and I don't think adding another hive-site.xml to my classpath will be helpful. I referred quite a few articles and spark mailing lists but could not find anyone doing this. Can someone suggest how I can achieve this? Thanks. Docs referred

Apache spark Hive, executable JAR with maven shade

心已入冬 提交于 2019-11-29 10:38:20
I'm building apache-spark application with Apache Spark Hive. So far everything was ok - I've been running tests and whole application in Intellij IDEA and all tests together using maven. Now I want to run whole application from bash and let it run with local single-node cluster. I'm using maven-shade-plugin to build single executable JAR. Application crashes when it tries to create new HiveContext out of SparkContext. Thrown exception tells me that hive can't create metastore because there is some problem with datanucleus and its plugin system. I tried to follow several questions how to run

Querying on multiple Hive stores using Apache Spark

回眸只為那壹抹淺笑 提交于 2019-11-29 00:31:41
问题 I have a spark application which will successfully connect to hive and query on hive tables using spark engine. To build this, I just added hive-site.xml to classpath of the application and spark will read the hive-site.xml to connect to its metastore. This method was suggested in spark's mailing list. So far so good. Now I want to connect to two hive stores and I don't think adding another hive-site.xml to my classpath will be helpful. I referred quite a few articles and spark mailing lists