How to create hive table from Spark data frame, using its schema?

后端 未结 5 1796
渐次进展
渐次进展 2020-12-13 21:45

I want to create a hive table using my Spark dataframe\'s schema. How can I do that?

For fixed columns, I can use:

val CreateTable_query = \"Create T         


        
5条回答
  •  不知归路
    2020-12-13 21:52

    Here is PySpark version to create Hive table from parquet file. You may have generated Parquet files using inferred schema and now want to push definition to Hive metastore. You can also push definition to the system like AWS Glue or AWS Athena and not just to Hive metastore. Here I am using spark.sql to push/create permanent table.

     # Location where my parquet files are present.
     df = spark.read.parquet("s3://my-location/data/")
    
        cols = df.dtypes
        buf = []
        buf.append('CREATE EXTERNAL TABLE test123 (')
        keyanddatatypes =  df.dtypes
        sizeof = len(df.dtypes)
        print ("size----------",sizeof)
        count=1;
        for eachvalue in keyanddatatypes:
            print count,sizeof,eachvalue
            if count == sizeof:
                total = str(eachvalue[0])+str(' ')+str(eachvalue[1])
            else:
                total = str(eachvalue[0]) + str(' ') + str(eachvalue[1]) + str(',')
            buf.append(total)
            count = count + 1
    
        buf.append(' )')
        buf.append(' STORED as parquet ')
        buf.append("LOCATION")
        buf.append("'")
        buf.append('s3://my-location/data/')
        buf.append("'")
        buf.append("'")
        ##partition by pt
        tabledef = ''.join(buf)
    
        print "---------print definition ---------"
        print tabledef
        ## create a table using spark.sql. Assuming you are using spark 2.1+
        spark.sql(tabledef);
    

提交回复
热议问题