I would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible with a nested foreach:
You cannot us
The accepted answer is one great solution but, in case you want to reorder the fields in the output (something I had to do recently) this might not work. Here's an alternative:
A = LOAD '$input' AS (f1, f2, f3, f4, f5);
GP = GROUP A BY (f1, f2, f3);
OUTPUT = FOREACH GP GENERATE
group.f1, group.f2, f4, f5, group.f3 ;
When you group on certain fields, the selection would have unique values for the group in a each tuple.