问题
I have two datasets:
A = {uid, url}; B = {uid, url};
now I do a cogroup
:
C = COGROUP A BY uid, B BY uid;
and I want to change C into {group AS uid, DISTINCT A.url+B.url
};
My question is how do I do this concatenation of two bags A.url and B.url?
Or to put it differently, how do I do DISTINCT
on multiple columns?
回答1:
It cannot be what you're expecting but that's what I understood from your question:
C = JOIN A BY uid, B BY uid;
D = DISTINCT C;
Concatenation is done the following way:
E = FOREACH D GENERATE CONCAT(A::uid,B::uid);
回答2:
A = LOAD 'A' using PigStorage() as (uid,url);
B = LOAD 'B' using PigStorage() as (uid,url);
C = JOIN A by uid ,B by uid;
D = FOREACH C GENERATE $0,CONCAT(A::url,B::url);
E= DISTINCT D;
dump E;
来源:https://stackoverflow.com/questions/10661389/how-to-combine-concat-two-bags-in-pig-latin