Reduce a key-value pair into a key-list pair with Apache Spark

前端 未结 9 1396
生来不讨喜
生来不讨喜 2020-11-27 14:21

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ...

9条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-27 15:13

    I tried with combineByKey ,here are my steps

    combineddatardd=sc.parallelize([("A", 3), ("A", 9), ("A", 12),("B", 4), ("B", 10), ("B", 11)])
    
    combineddatardd.combineByKey(lambda v:[v],lambda x,y:x+[y],lambda x,y:x+y).collect()
    

    Output:

    [('A', [3, 9, 12]), ('B', [4, 10, 11])]
    
    1. Define a function for combiner which sets accumulator to first key value pair which it encounters inside the partition convert the value to list in this step

    2. Define a function which mergers the new value of the same key to the accumulator value captured in step 1 Note:-convert the value to list in this function as accumulator value was converted to list in first step

    3. Define function to merge combiners outputs of individual partitions.

提交回复
热议问题