问题
In Pig, given the following Bag: (A, B, C), can I somehow calculate the unique combinations of all the values? The result I'm looking for is something like (AB, AC, BC). I'm disregarding BA, CA, CB since they would become duplicates of the existing values if sorted in alphabetic order.
回答1:
The only way of doing something like that is writing a UDF. This one will do exactly what you want:
public class CombinationsUDF extends EvalFunc<DataBag> {
public DataBag exec(Tuple input) throws IOException {
List<Tuple> bagValues = new ArrayList<Tuple>();
Iterator<Tuple> iter = ((DataBag)input.get(0)).iterator();
while (iter.hasNext()) {
bagValues.add(iter.next());
}
List<Tuple> outputTuples = new ArrayList<Tuple>();
for (int i = 0; i < bagValues.size() - 1; i++) {
List<Object> currentTupleValues = bagValues.get(i).getAll();
for (int j = i + 1; j < bagValues.size(); j++) {
List<Object> aux = new ArrayList<Object>(currentTupleValues);
aux.addAll(bagValues.get(j).getAll());
outputTuples.add(TupleFactory.getInstance().newTuple(aux));
}
}
DataBag output = BagFactory.getInstance().newDefaultBag(outputTuples);
return output;
}
}
来源:https://stackoverflow.com/questions/29994246/how-to-turn-a-b-c-into-ab-ac-bc-with-pig