I would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible with a nested foreach:
You cannot us
Group on all the other columns, project just the columns of interest into a bag, and then use FLATTEN to expand them out again:
FLATTEN
A_unique = FOREACH (GROUP A BY a4) { b = A.(a1,a2,a3); s = DISTINCT b; GENERATE FLATTEN(s), group AS a4; };