Hadoop - Load Hive tables using PIG

北城以北 提交于 2019-12-13 07:56:04

问题


I want to load Hive tables using Pig. I think we can do this through HCatLoader but I am using xml files to load pig. For this, I have to use XMLLoader. Can I use two options to load XML files in Pig.

I am extracting data from XML files using my own UDF and once we extract all the data, I have to load Pig data in Hive tables.

I can't use HIVE to extract the XML data as the XML I received is quite complex and I wrote my own UDF to parse the XML. Any suggestions or pointers how we can load Hive tables using PIG data.

I am using AWS.


回答1:


You can STORE the loaded data into text file using delimiters (may be comma) and then create an external table in hive pointing to your file location.

Create external table YOURTABLE (schema)
row format delimited
fields terminated by ','
location '/your/file/directory';



回答2:


You can store data from pig into Hive tables using HCatStorer. For example:

register 's3n://bucket/path/xmlUDF.jar'
xml = LOAD 's3n://bucket/pathtofiles' USING xmlUDF();
STORE xml INTO 'database.table' USING org.apache.hive.hcatalog.pig.HCatStorer();

Your question isn't quite clear. Are you hoping to work with the XML and Hive data within pig, do something, and then store the result in Hive? Just trying to store the XML data in Hive and work with it there?



来源:https://stackoverflow.com/questions/32921201/hadoop-load-hive-tables-using-pig

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!