Multi-line JSON read using Apache PIG

谁说胖子不能爱 提交于 2019-12-11 02:06:26

问题


I have a JSON file and want to read using Apache Pig.

I tried using the regular JSONLOADER, but looks like JSONLOADER works only with single line JSON. Then I tried with Elephant-Bird. But I am still not able to see the results correctly. Can any one please suggest a solution?

Input :

{"employees":[                                          
         {"firstName":"John", "lastName":"Doe"},              
         {"firstName":"Anna", "lastName":"Smith"},                      
         {"firstName":"Peter", "lastName":"Jones"}             
]}      

Note : I dont want to convert the input in to a single line.

Script:

A = LOAD 'input' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');       
B = FOREACH A GENERATE FLATTEN($0#'employees');    
Dump B;

Expected result should be :

([firstName#John,lastName#Doe])                                      
([firstName#Anna,lastName#Smith])                                 
([firstName#Peter,lastName#Jones])  

回答1:


As mentioned in the comments by siva, the answer is basically that you do need to change your input to a single line.

JsonLoader or elephantbird loader will always works only with single line . It will not work with multiline. You need to convert your input to single line before passing to pig. One workaround would be write a shell script and call the logic to replace multiline to single line using 'SED' command and then call the pig script in the shell script. This link will help you how to call pig thru shell script.



来源:https://stackoverflow.com/questions/28653431/multi-line-json-read-using-apache-pig

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!