PySpark groupByKey returning pyspark.resultiterable.ResultIterable

后端 未结 6 1298
不思量自难忘°
不思量自难忘° 2021-01-30 16:24

I am trying to figure out why my groupByKey is returning the following:

[(0, ), (1, 

        
6条回答
  •  Happy的楠姐
    2021-01-30 16:47

    Instead of using groupByKey(), i would suggest you use cogroup(). You can refer the below example.

    [(x, tuple(map(list, y))) for x, y in sorted(list(x.cogroup(y).collect()))]
    

    Example:

    >>> x = sc.parallelize([("foo", 1), ("bar", 4)])
    >>> y = sc.parallelize([("foo", -1)])
    >>> z = [(x, tuple(map(list, y))) for x, y in sorted(list(x.cogroup(y).collect()))]
    >>> print(z)
    

    You should get the desired output...

提交回复
热议问题