Reduce a key-value pair into a key-list pair with Apache Spark

前端 未结 9 1392
生来不讨喜
生来不讨喜 2020-11-27 14:21

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ...

9条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-11-27 15:20

    You can use the RDD groupByKey method.

    Input:

    data = [(1, 'a'), (1, 'b'), (2, 'c'), (2, 'd'), (2, 'e'), (3, 'f')]
    rdd = sc.parallelize(data)
    result = rdd.groupByKey().collect()
    

    Output:

    [(1, ['a', 'b']), (2, ['c', 'd', 'e']), (3, ['f'])]
    

提交回复
热议问题