Suppose you have
val docs = List(List(\"one\", \"two\"), List(\"two\", \"three\"))
where e.g. List(\"one\", \"two\") represents a document
docs.flatten.foldLeft(new Map.WithDefault(Map[String,Int](),Function.const(0))){
(m,x) => m + (x -> (1 + m(x)))}
What a train wreck!
[Edit]
Ah, that's better!
docs.flatten.foldLeft(Map[String,Int]() withDefaultValue 0){
(m,x) => m + (x -> (1 + m(x)))}
Try this:
scala> docs.flatten.groupBy(identity).mapValues(_.size)
res0: Map[String,Int] = Map(one -> 1, two -> 2, three -> 1)
If you are going to be accessing the counts many times, then you should avoid mapValues since it is "lazy" and, thus, would recompute the size on every access. This version gives you the same result but won't require the recomputations:
docs.flatten.groupBy(identity).map(x => (x._1, x._2.size))
The identity function just means x => x.
Starting Scala 2.13, after flattening the list of lists, we can use groupMapReduce which is a one-pass alternative to groupBy/mapValues:
// val docs = List(List("one", "two"), List("two", "three"))
docs.flatten.groupMapReduce(identity)(_ => 1)(_ + _)
// Map[String,Int] = Map("one" -> 1, "three" -> 1, "two" -> 2)
This:
flattens the List of Lists as a List
groups list elements (identity) (group part of groupMapReduce)
maps each grouped value occurrence to 1 (_ => 1) (map part of groupMapReduce)
reduces values within a group of values (_ + _) by summing them (reduce part of groupMapReduce).