问题
Let us Assume, I have a key value pair in Spark, such as the following.
[ (Key1, Value1), (Key1, Value2), (Key1, Vaue3), (Key2, Value4), (Key2, Value5) ]
Now I want to reduce this, to something like this.
[ (Key1, [Value1, Value2, Value3]), (Key2, [Value4, Value5]) ]
That is, from Key-Value to Key-List of Values.
How can I do that using the map and reduce functions in python or scala?
回答1:
collections.defaultdict
can be the solution https://docs.python.org/2/library/collections.html#collections.defaultdict
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for key, value in [('Key1', 'Value1'), ('Key1', 'Value2'), ('Key1', 'Vaue3'), ('Key2', 'Value4'), ('Key2', 'Value5') ]:
... d[key].append(value)
>>> print d.items()
[('Key2', ['Value4', 'Value5']), ('Key1', [ 'Value1','Value2', 'Vaue3'])]
回答2:
val data = Seq(("Key1", "Value1"), ("Key1", "Value2"), ("Key1", "Vaue3"), ("Key2", "Value4"), ("Key2", "Value5"))
data
.groupBy(_._1)
.mapValues(_.map(_._2))
res0: scala.collection.immutable.Map[String,Seq[String]] =
Map(
Key2 -> List(Value4, Value5),
Key1 -> List(Value1, Value2, Vaue3))
回答3:
I'm sure there's a more readable way to do this, but the first thing that comes to mind is using itertools.groupby
. Sort the list by the first element of the tuple (the key). Then use a list comprehension to iterate over the groups.
from itertools import groupby
l = [('key1', 1),('key1', 2),('key1', 3),('key2', 4),('key2', 5)]
l.sort(key = lambda i : i[0])
[(key, [i[1] for i in values]) for key, values in groupby(l, lambda i: i[0])]
Output
[('key1', [1, 2, 3]), ('key2', [4, 5])]
回答4:
Something like this
newlist = dict()
for x in l:
if x[0] not in newlist:
dict[x[0]] = list()
dict[x[0]].append(x[1])
回答5:
The shortest, using the defaultdict, is the following; no requirements on being sorted.
>>> from collections import defaultdict
>>> collect = lambda tuplist: reduce(lambda acc, (k,v): acc[k].append(v) or acc,\
tuplist, defaultdict(list))
>>> collect( [(1,0), (2,0), (1,2), (2,3)])
defaultdict(<type 'list'>, {1: [0, 2], 2: [0, 3]})
回答6:
Another scala one, avoiding groupBy/mapValues (although that's the obvious Scala solution this one follows the python one given by Vishni since @MetallicPriest commented that was "much easier")
val data = Seq(("Key1", "Value1"), ("Key1", "Value2"), ("Key1", "Vaue3"),
("Key2", "Value4"), ("Key2", "Value5"))
val dict = Map[String, Seq[String]]() withDefaultValue(Nil)
data.foldLeft(dict){ case (d, (k,v)) => d updated (k, d(k) :+ v) }
// Map(Key1 -> List(Value1, Value2, Vaue3), Key2 -> List(Value4, Value5))
(Does an append of the key to give the exact results of the question. Prepend would be more efficient, though)
Mutable version, even closer to the Python one:
import scala.collection.mutable.{Map, Seq}
val dict = Map[String, Seq[String]]() withDefaultValue(Seq())
for ((k,v) <- data) dict(k) :+= v
dict
// Map(Key2 -> ArrayBuffer(Value4, Value5),
// Key1 -> ArrayBuffer(Value1, Value2, Vaue3))
来源:https://stackoverflow.com/questions/26780348/how-can-a-reduce-a-key-value-pair-to-key-and-list-of-values