I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ...
Ok. I hope, I got this right. Your input is something like this:
kv_input = [("a", 1), ("a", 2), ("a", 3), ("b", 1), ("b", 5)]
and you want to get something like this:
kmv_output = [("a", [1, 2, 3]), ("b", [1, 5])]
Then this might do the job (see here):
d = dict()
for k, v in kv_input:
d.setdefault(k, list()).append(v)
kmv_output = list(d.items())
If I got this wrong, please tell me, so I might adjust this to your needs.
P.S.: a.append([b]) returns always None. You might want to observe either [b] or a but not the result of append.