Pandas dataframe in pyspark to hive

后端 未结 3 1509
时光说笑
时光说笑 2021-01-04 21:13

How to send a pandas dataframe to a hive table?

I know if I have a spark dataframe, I can register it to a temporary table using

df.registerTempTabl         


        
3条回答
  •  粉色の甜心
    2021-01-04 21:38

    first u need to convert pandas dataframe to spark dataframe:

    from pyspark.sql import HiveContext
    hive_context = HiveContext(sc)
    df = hive_context.createDataFrame(pd_df)
    

    then u can create a temptable which is in memory:

    df.registerTempTable('tmp')
    

    now,u can use hive ql to save data into hive:

    hive_context.sql("""insert overwrite table target partition(p='p') select a,b from tmp'''
    

    note than:the hive_context must be keep to the same one!

提交回复
热议问题