How To Find All Possible Permutations From A Bag under apache pig

三世轮回 提交于 2019-12-11 09:59:52

问题


i'm trying to find all combinations possible using apache pig, i was able to generate permutation but i want to eliminate the replication of values i write this code :

A = LOAD 'data' AS f1:chararray;
DUMP A;
('A')
('B')
('C')
B = FOREACH A GENERATE $0 AS v1;
C = FOREACH A GENERATE $0 AS v2;
D = CROSS B, C;

And the result i obtained is like :

 DUMP D;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'A')
('B', 'B')
('B', 'C')
('C', 'A')
('C', 'B')
('C', 'C')

but what i'm trying to obtain the result is like bellow

DUMP R;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')

how can i do this? i avoid to use comparison of characters because it's possible to have multiple occurrences of a string in more than a line


回答1:


You can FILTER D to remove the rows you don't want. For example

A = load 'testdata.txt';
B = foreach A generate $0;
C = Cross A, B;
D = filter C by $0 <= $1;
dump D;

which prints out

(C,C)
(B,C)
(B,B)
(A,C)
(A,B)
(A,A)

when 'testdata.txt' has

A
B
C


来源:https://stackoverflow.com/questions/25408179/how-to-find-all-possible-permutations-from-a-bag-under-apache-pig

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!