How to create hive table from Spark data frame, using its schema?

后端 未结 5 1775
渐次进展
渐次进展 2020-12-13 21:45

I want to create a hive table using my Spark dataframe\'s schema. How can I do that?

For fixed columns, I can use:

val CreateTable_query = \"Create T         


        
5条回答
  •  执念已碎
    2020-12-13 21:58

    Another way is to use methods available on StructType.. sql , simpleString, TreeString etc...

    You can create DDLs from a Dataframe's schema, Can create Dataframe's schema from your DDLs ..

    Here is one example - ( Till Spark 2.3)

        // Setup Sample Test Table to create Dataframe from
        spark.sql(""" drop database hive_test cascade""")
        spark.sql(""" create database hive_test""")
        spark.sql("use hive_test")
        spark.sql("""CREATE TABLE hive_test.department(
        department_id int ,
        department_name string
        )    
        """)
        spark.sql("""
        INSERT INTO hive_test.department values ("101","Oncology")    
        """)
    
        spark.sql("SELECT * FROM hive_test.department").show()
    
    /***************************************************************/
    

    Now I have Dataframe to play with. in real cases you'd use Dataframe Readers to create dataframe from files/databases. Let's use it's schema to create DDLs

      // Create DDL from Spark Dataframe Schema using simpleString function
    
     // Regex to remove unwanted characters    
        val sqlrgx = """(struct<)|(>)|(:)""".r
     // Create DDL sql string and remove unwanted characters
    
        val sqlString = sqlrgx.replaceAllIn(spark.table("hive_test.department").schema.simpleString, " ")
    
    // Create Table with sqlString
       spark.sql(s"create table hive_test.department2( $sqlString )")
    

    Spark 2.4 Onwards you can use fromDDL & toDDL methods on StructType -

    val fddl = """
          department_id int ,
          department_name string,
          business_unit string
          """
    
    
        // Easily create StructType from DDL String using fromDDL
        val schema3: StructType = org.apache.spark.sql.types.StructType.fromDDL(fddl)
    
    
        // Create DDL String from StructType using toDDL
        val tddl = schema3.toDDL
    
        spark.sql(s"drop table if exists hive_test.department2 purge")
    
       // Create Table using string tddl
        spark.sql(s"""create table hive_test.department2 ( $tddl )""")
    
        // Test by inserting sample rows and selecting
        spark.sql("""
        INSERT INTO hive_test.department2 values ("101","Oncology","MDACC Texas")    
        """)
        spark.table("hive_test.department2").show()
        spark.sql(s"drop table hive_test.department2")
    
    

提交回复
热议问题