create a hive table from list of case class using spark

假装没事ソ 提交于 2019-12-23 04:30:33

问题


I am trying to create a hive table from the list of case class. But it does not allow to specify the database name. Below error is being thrown.

Spark Version: 1.6.2

Error: diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: Table not found: mytempTable; line 1 pos 58

Please let me know the way to save the output of map method to a hive table withe same structure as case class.

Note: recordArray list is being populated in the map method (in getElem() method infact) for the input given

    object testing extends Serializable {
          var recordArray=List[Record]();
           def main(args:Array[String])
          {
          val inputpath = args(0).toString();
          val outputpath=args(1).toString();


          val conf = new SparkConf().setAppName("jsonParsing")
          val sc = new SparkContext(conf)
          val sqlContext= new SQLContext(sc)
          val hsc = new HiveContext(sc)

          val input = sc.textFile(inputpath)
          //val input=sc.textFile("file:///Users/Documents/Work/data/mydata.txt")
         // input.collect().foreach(println)
         val = input.map(data=>getElem(parse(data,false))) 
   val recordRDD = sc.parallelize(recordArray)
//
     val recordDF=sqlContext.createDataFrame(recordRDD)
    recordDF.registerTempTable("mytempTable") 
     hsc.sql("create table dev_db.ingestion as select * from mytempTable")
        }

    case class Record(summary_key: String, key: String,array_name_position:Int,Parent_Level_1:String,Parent_level_2:String,Parent_Level_3:String,Parent_level_4:String,Parent_level_5:String,
            param_name_position:Integer,Array_name:String,paramname:String,paramvalue:String)
    }

回答1:


you need to have/create a HiveContext

import org.apache.spark.sql.hive.HiveContext;
HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());

Then directly save dataframe or select the columns to store as hive table

recordDF is dataframe

recordDF.write().mode("overwrite").saveAsTable("schemaName.tableName");

or

recordDF.select(recordDF.col("col1"),recordDF.col("col2"), recordDF.col("col3")) .write().mode("overwrite").saveAsTable("schemaName.tableName");

or

recordDF.write().mode(SaveMode.Overwrite).saveAsTable("dbName.tableName");

SaveModes are Append/Ignore/Overwrite/ErrorIfExists

I added here the definition for HiveContext from Spark Documentation,



来源:https://stackoverflow.com/questions/43840897/create-a-hive-table-from-list-of-case-class-using-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!