Loading unstructered data with different delimiters in Pig using PigLatin only

我的梦境 提交于 2019-12-13 05:49:20

问题


Hi I am trying to load the following data (inculdes different delimiters and is unstructered) into Pig using PigLatin only, without preparing the data with i.e. Java.

Input:

1234 #one,#two,#three
5679 #one,#two
1234 #one

Output what I am looking for:

1234 #one
1234 #two
1234 #three
5678 #one
5678 #two
1234 #one

Any ideas? Is this even possible in Pig? Thanks a lot in advance!


回答1:


Pig Script :

A = LOAD 'a.csv' AS USING PigStorage(' ') (key:chararray, value:chararray);
B = FOREACH A GENERATE key, FLATTEN(TOKENIZE(value, ','));
DUMP B;

Input : a.csv :

1234 #one,#two,#three
5679 #one,#two
1234 #one

Output : DUMP B:

(1234,#one)
(1234,#two)
(1234,#three)
(5679,#one)
(5679,#two)
(1234,#one)


来源:https://stackoverflow.com/questions/31061504/loading-unstructered-data-with-different-delimiters-in-pig-using-piglatin-only

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!