pig how to filter distinct couples (pairs)
问题 I am new to Pig. I have a Pig script which generates tab-separated pairs between two element. One pair for each line, for example: John Paul Tom Nik Mark Bill Tom Nik Paul John I need to filter out duplicate combinations. If I use DISTINCT, I filter out double "Tom Nik" entry. The result is: John Paul Tom Nik Mark Bill Paul John The problem with this approach is that I am left with both "John Paul" and "Paul John", which for my purposes should be treated as the same (same combination). Is