I'm new to Python and Hive.
I was hoping I might get some advice.
Does anyone have any tips on how to turn a python pandas dataframe into a hive table?
Your script should run inside a machine where hive can load data using the "load local data in path" method.
Query pandas data frame to create a list of column name datatype
Compose a valid HQL (DDL) create table statement using python string operations (basically concatenations)
Issue a create table statement in Hive.
Write the pandas dataframe as cvs separated by "\t" turning headers off and index off (check paramerets of to_csv() )
5.- From your python script call a system console running hive -e:
Use: for instance:
p = subprocess.Popen( ['hive', '-e', str_command_list], stdout = subprocess.PIPE,
stderr = subprocess.PIPE )
out, err = p.communicate()
This will call hive console and execute for instance, load data local inpath, inserting your csv data into the created table.
Then you are happy.
来源:https://stackoverflow.com/questions/23817958/pandas-dataframe-to-hive-table