I have a bunch of tuples which are in form of composite keys and values. For example,
tfile.collect() = [((\'id1\',\'pd1\',\'t1\'),5.0),
((\'id2\',\'p
I grouped ((id1,t1),((p1,5.0),(p2,6.0)) and so on ... as my map function. Later, I reduce using map_group which creates an array for [p1,p2, . . . ] and fills in values in their respective positions.
def map_group(pgroup):
x = np.zeros(19)
x[0] = 1
value_list = pgroup[1]
for val in value_list:
fno = val[0].split('.')[0]
x[int(fno)-5] = val[1]
return x
tgbr = tfile.map(lambda d: ((d[0][0],d[0][2]),[(d[0][1],d[1])])) \
.reduceByKey(lambda p,q:p+q) \
.map(lambda d: (d[0], map_group(d)))
This does feel like an expensive solution in terms of computation. But works for now.