how to combine/concat two bags in pig latin

六眼飞鱼酱① 提交于 2019-12-12 14:21:27

问题


I have two datasets:

A = {uid, url}; B = {uid, url};

now I do a cogroup:

C = COGROUP A BY uid, B BY uid;

and I want to change C into {group AS uid, DISTINCT A.url+B.url};

My question is how do I do this concatenation of two bags A.url and B.url?

Or to put it differently, how do I do DISTINCT on multiple columns?


回答1:


It cannot be what you're expecting but that's what I understood from your question:

C = JOIN A BY uid, B BY uid;
D = DISTINCT C;

Concatenation is done the following way:

E = FOREACH D GENERATE CONCAT(A::uid,B::uid); 



回答2:


A = LOAD 'A' using PigStorage() as (uid,url);
B = LOAD 'B' using PigStorage() as (uid,url);
C = JOIN A by uid ,B by uid;
D = FOREACH C GENERATE $0,CONCAT(A::url,B::url);
E= DISTINCT D;
dump E;


来源:https://stackoverflow.com/questions/10661389/how-to-combine-concat-two-bags-in-pig-latin

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!