I would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible with a nested foreach:
You cannot us
I was looking to do the same: "I would like to perform a DISTINCT operation on a subset of the columns". The way I did it was:
A = LOAD 'data' AS(a1,a2,a3,a4);
interested_fields = FOREACH A GENERATE a1,a2,a3;
distinct_fields= DISTINCT interested_fields;
final_answer = FOREACH distinct_fields GENERATE FLATTEN($0);
I know it's not an example of how to perform a nested foreach as suggested in the documentation; but it's a way of doing a distinct over a subset of fields. Hope It helps to anyone who gets here just like I did.