Suppose you have
val docs = List(List(\"one\", \"two\"), List(\"two\", \"three\"))
where e.g. List(\"one\", \"two\") represents a document
docs.flatten.foldLeft(new Map.WithDefault(Map[String,Int](),Function.const(0))){
(m,x) => m + (x -> (1 + m(x)))}
What a train wreck!
[Edit]
Ah, that's better!
docs.flatten.foldLeft(Map[String,Int]() withDefaultValue 0){
(m,x) => m + (x -> (1 + m(x)))}
Try this:
scala> docs.flatten.groupBy(identity).mapValues(_.size)
res0: Map[String,Int] = Map(one -> 1, two -> 2, three -> 1)
If you are going to be accessing the counts many times, then you should avoid mapValues
since it is "lazy" and, thus, would recompute the size on every access. This version gives you the same result but won't require the recomputations:
docs.flatten.groupBy(identity).map(x => (x._1, x._2.size))
The identity
function just means x => x
.
Starting Scala 2.13
, after flatten
ing the list of lists, we can use groupMapReduce which is a one-pass alternative to groupBy
/mapValues
:
// val docs = List(List("one", "two"), List("two", "three"))
docs.flatten.groupMapReduce(identity)(_ => 1)(_ + _)
// Map[String,Int] = Map("one" -> 1, "three" -> 1, "two" -> 2)
This:
flatten
s the List
of List
s as a List
group
s list elements (identity
) (group part of groupMapReduce)
map
s each grouped value occurrence to 1 (_ => 1
) (map part of groupMapReduce)
reduce
s values within a group of values (_ + _
) by summing them (reduce part of groupMapReduce).