Merge two lines in Pig

老子叫甜甜 提交于 2019-12-23 22:12:37

问题


I would like to write a pig script for below query.

Input is:

ABC,DEF,,
,,GHI,JKL
MNO,PQR,,
,,STU,VWX

Output should be:

ABC,DEF,GHI,JKL
MNO,PQR,STU,VWX

Could anyone please help me?


回答1:


It will be difficult to solve this problem using native pig. One option could be download the datafu-1.2.0.jar library and try the below approach.

input.txt

ABC,DEF,,
,,GHI,JKL
MNO,PQR,,
,,STU,VWX

PigScript:

REGISTER /tmp/datafu-1.2.0.jar;
DEFINE BagSplit datafu.pig.bags.BagSplit();

A = LOAD 'input.txt' USING PigStorage(',') AS(f1,f2,f3,f4);
B = GROUP A ALL;
C = FOREACH B GENERATE FLATTEN(BagSplit(2,$1)) AS mybag;
D = FOREACH C GENERATE FLATTEN(STRSPLIT(REPLACE(BagToString(mybag),'_null_null_null_null',''),'_',4));
E = FOREACH D GENERATE $2,$3,$0,$1;
DUMP E;

Output:

(MNO,PQR,STU,VWX)
(ABC,DEF,GHI,JKL)

Note: Based on the above input format, my assumption will be 1st row last two cols will be null, 2nd row first two cols will be null, similarly for 3rd and 4th row also



来源:https://stackoverflow.com/questions/27628171/merge-two-lines-in-pig

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!