Pandas DataFrame to Hive Table

拟墨画扇 提交于 2019-12-04 10:51:45

Your script should run inside a machine where hive can load data using the "load local data in path" method.

  1. Query pandas data frame to create a list of column name datatype

  2. Compose a valid HQL (DDL) create table statement using python string operations (basically concatenations)

  3. Issue a create table statement in Hive.

  4. Write the pandas dataframe as cvs separated by "\t" turning headers off and index off (check paramerets of to_csv() )

5.- From your python script call a system console running hive -e:

Use: for instance:


p = subprocess.Popen( ['hive', '-e', str_command_list], stdout = subprocess.PIPE,
                                                        stderr = subprocess.PIPE )
out, err = p.communicate()

This will call hive console and execute for instance, load data local inpath, inserting your csv data into the created table.

Then you are happy.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!