Reduce a key-value pair into a key-list pair with Apache Spark

前端 未结 9 1421
生来不讨喜
生来不讨喜 2020-11-27 14:21

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ...

9条回答
  •  猫巷女王i
    2020-11-27 15:03

    Ok. I hope, I got this right. Your input is something like this:

    kv_input = [("a", 1), ("a", 2), ("a", 3), ("b", 1), ("b", 5)]
    

    and you want to get something like this:

    kmv_output = [("a", [1, 2, 3]), ("b", [1, 5])]
    

    Then this might do the job (see here):

    d = dict()
    for k, v in kv_input:
        d.setdefault(k, list()).append(v)
    kmv_output = list(d.items())
    

    If I got this wrong, please tell me, so I might adjust this to your needs.

    P.S.: a.append([b]) returns always None. You might want to observe either [b] or a but not the result of append.

提交回复
热议问题