I am trying to figure out why my groupByKey is returning the following:
[(0, ), (1,
Instead of using groupByKey(), i would suggest you use cogroup(). You can refer the below example.
[(x, tuple(map(list, y))) for x, y in sorted(list(x.cogroup(y).collect()))]
Example:
>>> x = sc.parallelize([("foo", 1), ("bar", 4)])
>>> y = sc.parallelize([("foo", -1)])
>>> z = [(x, tuple(map(list, y))) for x, y in sorted(list(x.cogroup(y).collect()))]
>>> print(z)
You should get the desired output...