How to perform a DISTINCT in Pig Latin on a subset of columns?

前端 未结 6 818
广开言路
广开言路 2020-12-30 07:06

I would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible with a nested foreach:

You cannot us

6条回答
  •  执念已碎
    2020-12-30 07:10

    I was looking to do the same: "I would like to perform a DISTINCT operation on a subset of the columns". The way I did it was:

    A = LOAD 'data' AS(a1,a2,a3,a4);
    interested_fields = FOREACH A GENERATE a1,a2,a3;
    distinct_fields= DISTINCT interested_fields;
    final_answer = FOREACH distinct_fields GENERATE FLATTEN($0);
    

    I know it's not an example of how to perform a nested foreach as suggested in the documentation; but it's a way of doing a distinct over a subset of fields. Hope It helps to anyone who gets here just like I did.

提交回复
热议问题