How to perform a DISTINCT in Pig Latin on a subset of columns?

前端 未结 6 815
广开言路
广开言路 2020-12-30 07:06

I would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible with a nested foreach:

You cannot us

6条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-30 07:31

    Group on all the other columns, project just the columns of interest into a bag, and then use FLATTEN to expand them out again:

    A_unique =
        FOREACH (GROUP A BY a4) {
            b = A.(a1,a2,a3);
            s = DISTINCT b;
            GENERATE FLATTEN(s), group AS a4;
        };
    

提交回复
热议问题