问题
I have a JSON file and want to read using Apache Pig.
I tried using the regular JSONLOADER
, but looks like JSONLOADER
works only with single line JSON. Then I tried with Elephant-Bird
. But I am still not able to see the results correctly. Can any one please suggest a solution?
Input :
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
Note : I dont want to convert the input in to a single line.
Script:
A = LOAD 'input' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
B = FOREACH A GENERATE FLATTEN($0#'employees');
Dump B;
Expected result should be :
([firstName#John,lastName#Doe])
([firstName#Anna,lastName#Smith])
([firstName#Peter,lastName#Jones])
回答1:
As mentioned in the comments by siva, the answer is basically that you do need to change your input to a single line.
JsonLoader or elephantbird loader will always works only with single line . It will not work with multiline. You need to convert your input to single line before passing to pig. One workaround would be write a shell script and call the logic to replace multiline to single line using 'SED' command and then call the pig script in the shell script. This link will help you how to call pig thru shell script.
来源:https://stackoverflow.com/questions/28653431/multi-line-json-read-using-apache-pig