How to perform a DISTINCT in Pig Latin on a subset of columns?

前端 未结 6 828
广开言路
广开言路 2020-12-30 07:06

I would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible with a nested foreach:

You cannot us

6条回答
  •  不知归路
    2020-12-30 07:31

    The accepted answer is one great solution but, in case you want to reorder the fields in the output (something I had to do recently) this might not work. Here's an alternative:

    A = LOAD '$input' AS (f1, f2, f3, f4, f5);
    GP = GROUP A BY (f1, f2, f3);
    OUTPUT = FOREACH GP GENERATE 
        group.f1, group.f2, f4, f5, group.f3 ;
    

    When you group on certain fields, the selection would have unique values for the group in a each tuple.

提交回复
热议问题