From this:
(1, {(1,2), (1,3), (1,4)} )
(2, {(2,5), (2,6), (2,7)} )
...How could we generate this?
((1,2),(1,3),(1,4))
((2,5
There is no builtin way to convert a bag to a tuple. This is because bags are unordered sets of tuples, so Pig doesn't know what order that the tuples should be set to when it is converted into a tuple. This means that you'll have to write a UDF to do this.
I'm not sure how you are creating the (1, 2, 3, 4)
tuple, but this is another good candidate for a UDF, even though you could create that schema with just the BagToTuple UDF.
NOTE: You probably shouldn't be turning anything into a tuple unless you know exactly how many fields there are.
myudfs.py
#!/usr/bin/python
@outputSchema('T:(T1:(a1:chararray, a2:chararray), T2:(b1:chararray, b2:chararray), T3:(c1:chararray, c2:chararray))')
def BagToTuple(B):
return tuple(B)
def generateContent(B):
foo = [B[0][0]] + [ t[1] for t in B ]
return tuple(foo)
myscript.pig
REGISTER 'myudfs.py' USING jython AS myudfs ;
-- A is (1, {(1,2), (1,3), (1,4)} )
-- The schema is (I:int, B:{T:(I1:int, I2:int)})
B = FOREACH A GENERATE myudfs.BagToTuple(B) ;
C = FOREACH A GENERATE myudfs.generateContent(B) ;