How to read/write a hive table from within the spark executors

杀马特。学长 韩版系。学妹 提交于 2019-12-08 11:34:58

问题


I have a requirement wherein I am using DStream to retrieve the messages from Kafka. Now after getting message or RDD now i use a map operation to process the messages independently on the executors. The one challenge I am facing is i need to read/write to a hive table from within the executors and for this i need access to SQLContext. But as far as i know SparkSession is available at driver side only and should not be used within the executors. Now without the spark session (in spark 2.1.1) i can't get hold of SQLContext. To summarize My driver codes looks something like:

if (inputDStream_obj.isSuccess) {
      val inputDStream = inputDStream_obj.get
      inputDStream.foreachRDD(rdd => {

        if (!rdd.isEmpty) {
           val rdd1 = rdd.map(idocMessage => SegmentLoader.processMessage(props, idocMessage.value(), true))
       }
}

So after this rdd.map the next code is executed on the executors and there I have something like:

val sqlContext = spark.sqlContext
 import sqlContext.implicits._
 spark.sql("USE " + databaseName)
 val result = Try(df.write.insertInto(tableName))

Passing sparksession or sqlcontext gives error when they are used on the executor:

  • When I try to obtain the existing sparksession: org.apache.spark.SparkException: A master URL must be set in your configuration

  • When I broadcast session variable:User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 9, <server>, executor 2): java.lang.NullPointerException

  • When i pass sparksession object: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 9, <server>, executor 2): java.lang.NullPointerException

Let me know if you can suggest how to query/update a hive table from within the executors.

Thanks, Ritwick

来源:https://stackoverflow.com/questions/44495803/how-to-read-write-a-hive-table-from-within-the-spark-executors

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!