问题
Hi I am trying to load the following data (inculdes different delimiters and is unstructered) into Pig using PigLatin only, without preparing the data with i.e. Java.
Input:
1234 #one,#two,#three
5679 #one,#two
1234 #one
Output what I am looking for:
1234 #one
1234 #two
1234 #three
5678 #one
5678 #two
1234 #one
Any ideas? Is this even possible in Pig? Thanks a lot in advance!
回答1:
Pig Script :
A = LOAD 'a.csv' AS USING PigStorage(' ') (key:chararray, value:chararray);
B = FOREACH A GENERATE key, FLATTEN(TOKENIZE(value, ','));
DUMP B;
Input : a.csv :
1234 #one,#two,#three
5679 #one,#two
1234 #one
Output : DUMP B:
(1234,#one)
(1234,#two)
(1234,#three)
(5679,#one)
(5679,#two)
(1234,#one)
来源:https://stackoverflow.com/questions/31061504/loading-unstructered-data-with-different-delimiters-in-pig-using-piglatin-only