How to create hive table from Spark data frame, using its schema?

后端 未结 5 1795
渐次进展
渐次进展 2020-12-13 21:45

I want to create a hive table using my Spark dataframe\'s schema. How can I do that?

For fixed columns, I can use:

val CreateTable_query = \"Create T         


        
5条回答
  •  孤城傲影
    2020-12-13 22:16

    As per your question it looks like you want to create table in hive using your data-frame's schema. But as you are saying you have many columns in that data-frame so there are two options

    • 1st is create direct hive table trough data-frame.
    • 2nd is take schema of this data-frame and create table in hive.

    Consider this code:

    package hive.example
    
    import org.apache.spark.SparkConf
    import org.apache.spark.SparkContext
    import org.apache.spark.sql.SQLContext
    import org.apache.spark.sql.Row
    import org.apache.spark.sql.SparkSession
    
    object checkDFSchema extends App {
      val cc = new SparkConf;
      val sc = new SparkContext(cc)
      val sparkSession = SparkSession.builder().enableHiveSupport().getOrCreate()
      //First option for creating hive table through dataframe 
      val DF = sparkSession.sql("select * from salary")
      DF.createOrReplaceTempView("tempTable")
      sparkSession.sql("Create table yourtable as select * form tempTable")
      //Second option for creating hive table from schema
      val oldDFF = sparkSession.sql("select * from salary")
      //Generate the schema out of dataframe  
      val schema = oldDFF.schema
      //Generate RDD of you data 
      val rowRDD = sc.parallelize(Seq(Row(100, "a", 123)))
      //Creating new DF from data and schema 
      val newDFwithSchema = sparkSession.createDataFrame(rowRDD, schema)
      newDFwithSchema.createOrReplaceTempView("tempTable")
      sparkSession.sql("create table FinalTable AS select * from tempTable")
    }
    

提交回复
热议问题