How to create hive table from Spark data frame, using its schema?

后端 未结 5 1802
渐次进展
渐次进展 2020-12-13 21:45

I want to create a hive table using my Spark dataframe\'s schema. How can I do that?

For fixed columns, I can use:

val CreateTable_query = \"Create T         


        
5条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-13 22:11

    Assuming, you are using Spark 2.1.0 or later and my_DF is your dataframe,

    //get the schema split as string with comma-separated field-datatype pairs
    StructType my_schema = my_DF.schema();
    String columns = Arrays.stream(my_schema.fields())
                           .map(field -> field.name()+" "+field.dataType().typeName())
                           .collect(Collectors.joining(","));
    
    //drop the table if already created
    spark.sql("drop table if exists my_table");
    //create the table using the dataframe schema
    spark.sql("create table my_table(" + columns + ") 
        row format delimited fields terminated by '|' location '/my/hdfs/location'");
        //write the dataframe data to the hdfs location for the created Hive table
        my_DF.write()
        .format("com.databricks.spark.csv")
        .option("delimiter","|")
        .mode("overwrite")
        .save("/my/hdfs/location");
    

    The other method using temp table

    my_DF.createOrReplaceTempView("my_temp_table");
    spark.sql("drop table if exists my_table");
    spark.sql("create table my_table as select * from my_temp_table");
    

提交回复
热议问题