问题
I often need to do something like
coll.groupBy(f(_)).mapValues(_.foldLeft(x)(g(_,_)))
What is the best way to achieve the same effect, but avoid explicitly constructing the intermediate collections with groupBy
?
回答1:
You could fold the initial collection over a map holding your intermediate results:
def groupFold[A,B,X](as: Iterable[A], f: A => B, init: X, g: (X,A) => X): Map[B,X] =
as.foldLeft(Map[B,X]().withDefaultValue(init)){
case (m,a) => {
val key = f(a)
m.updated(key, g(m(key),a))
}
}
You said collection and I wrote Iterable
, but you have to think whether order matters in the fold in your question.
If you want efficient code, you will probably use a mutable map as in Rex' answer.
回答2:
You can't really do it as a one-liner, so you should be sure you need it before writing something more elaborate like this (written from a somewhat performance-minded view since you asked for "efficient"):
final case class Var[A](var value: A) { }
def multifold[A,B,C](xs: Traversable[A])(f: A => B)(zero: C)(g: (C,A) => C) = {
import scala.collection.JavaConverters._
val m = new java.util.HashMap[B, Var[C]]
xs.foreach{ x =>
val v = {
val fx = f(x)
val op = m.get(fx)
if (op != null) op
else { val nv = Var(zero); m.put(fx, nv); nv }
}
v.value = g(v.value, x)
}
m.asScala.mapValues(_.value)
}
(Depending on your use case you may wish to pack into an immutable map instead in the last step.) Here's an example of it in action:
scala> multifold(List("salmon","herring","haddock"))(_(0))(0)(_ + _.length)
res1: scala.collection.mutable.HashMap[Char,Int] = Map(h -> 14, s -> 6)
Now, you might notice something weird here: I'm using a Java HashMap. This is because Java's HashMaps are 2-3x faster than Scala's. (You can write the equivalent thing with a Scala HashMap, but it doesn't actually make things any faster than your original.) Consequently, this operation is 2-3x faster than what you posted. But unless you're under severe memory pressure, creating the transient collections doesn't actually hurt you much.
来源:https://stackoverflow.com/questions/15529726/efficient-groupwise-aggregation-on-scala-collections