Reduce a key-value pair into a key-list pair with Apache Spark

前端 未结 9 1395
生来不讨喜
生来不讨喜 2020-11-27 14:21

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ...

9条回答
  •  遥遥无期
    2020-11-27 15:00

    I'm kind of late to the conversation, but here's my suggestion:

    >>> foo = sc.parallelize([(1, ('a','b')), (2, ('c','d')), (1, ('x','y'))])
    >>> foo.map(lambda (x,y): (x, [y])).reduceByKey(lambda p,q: p+q).collect()
    [(1, [('a', 'b'), ('x', 'y')]), (2, [('c', 'd')])]
    

提交回复
热议问题