Let\'s say I have a string, \"hello\", and I want to generate a character frequency map:
Map[Char,Int] = Map(h -> 1, e -> 1, o -> 1, l -> 2)
Extending Axel's answer.
Your groupBy
solution is already functional. There's just a tiny-tiny correction to it which could make it cleaner:
str.groupBy(_.toChar).mapValues(_.size)
The Scala's alternative to inject
is foldLeft
, foldRight
, reduce
, reduceOption
depending on how you use it. The way you've used inject
in Ruby is not functional, since your solution is based on mutating h
and in functional world mutability is a "no-no". Here's how you'd do the solution close to your inject
but in functional style in Scala:
str.foldLeft( Map[Char, Int]() ){ (m, c) => m + (c -> (m.getOrElse(c, 0) + 1)) }
Obviously groupBy
looks much better.
p
mean?groupBy
takes a function which maps an elements to a key of type K
. When invoked on some collection Coll
, it returns a Map[K, Coll]
which contains mappings from keys K
to all the elements which mapped to the same key.
So, in your case, str.groupBy(_.toChar)
yields a map mapping from a key k
(which is a character) to a string with all the elements (characters) c
such that k == c.toChar
.
You get this:
Map(e -> "e", h -> "h", l -> "ll", o -> "o")
A Map
is an iterable of pairs of keys and values. In this case, each pair is a character and a string of elements. Calling the map
operation on a Map
involves mapping on these pairs - p
is a pair where p._1
is a character, and p._2
is the associated string (on which you can call length
, as you did above).
The above is how to do it idiomatically - using groupBy
and map
. Alternatively, you can use an immutable map and recursion on the string length to compute the frequencies, or an immutable map and a foldLeft
.
Best to benchmark to see the differences. Here are a couple of microbenchmark for a highly-repetitive string (~3GHz iMac, JDK7, Scala 2.10.0 nightly):
object Imperative extends testing.Benchmark {
val str = "abc" * 750000
def run() {
var counts = new scala.collection.mutable.HashMap[Char,Int]
var i = 0
val until = str.length
while (i < until) {
var c = str(i)
if (counts.contains(c))
counts.put(c, counts(c) + 1)
else
counts.put(c, 1)
i += 1
}
//println(f)
}
}
object Combinators extends testing.Benchmark {
val str = "abc" * 750000
def run() {
val f = str.groupBy(_.toChar).map(p => (p._1, p._2.length))
}
}
object Fold extends testing.Benchmark {
val str = "abc" * 750000
def run() {
val f = str.foldLeft(Map[Char, Int]() withDefaultValue 0){(h, c) => h.updated(c, h(c)+1)}
}
}
Results:
Imperative: $ 103 57 53 58 53 53 53 53 53 53
Combinators: $ 72 51 63 56 53 52 52 54 53 53
Fold: $ 163 62 71 62 57 57 57 58 57 57
Note that changing the imperative version to use withDefaultValue
:
var counts = new scala.collection.mutable.HashMap[Char,Int].withDefaultValue(0)
var i = 0
val until = str.length
while (i < until) {
var c = str(i)
counts.put(c, counts(c) + 1)
i += 1
}
is apparently terribly slow due to forwarding each put
call:
withDefaultValue
: $ 133 87 109 106 101 100 101 100 101 101
Conclusion: the boxing and unboxing of characters in this case is high-enough so that the differences in performance between these approaches are hard to observe.
EDIT:
Update: You may want to use ScalaMeter inline benchmarking in place of the Benchmark
trait.
Your example on ruby can be almost directly translated to Scala using foldLeft
and immutable Map
.
Here is one of possible solutions:
str.foldLeft(Map[Char, Int]() withDefaultValue 0){(h, c) => h.updated(c, h(c)+1)}
Actually, if you are ok with local mutability, you can make something like this:
def charFrequencies(str: String): collection.Map[Char, Int] = {
val hash = collection.mutable.HashMap.empty[Char, Int] withDefaultValue 0
str foreach { hash(_) += 1 }
hash
}
Expression hash(_) += 1
will be desugared to c => hash(c) = hash(c) + 1
and then to c => hash.update(c, hash.apply(c) + 1)
This solution should be more efficient than functional ones, because it don't create intermediate collections. Also because method returns immutable collection.Map[Char, Int]
, result will be treated as immutable (as long as no one will perform unsafe downcasting on it).
Starting in Scala 2.13
, we can use the groupMapReduce method which is (as its name suggests) an equivalent of a groupBy
followed by mapValues
and a reduce step:
"hello".groupMapReduce(identity)(_ => 1)(_ + _)
// immutable.Map[Char,Int] = Map(e -> 1, h -> 1, l -> 2, o -> 1)
This:
group
s characters (group part of groupMapReduce)
map
s each grouped value occurrence to 1 (map part of groupMapReduce)
reduce
s values within a group of values (_ + _
) by summing them (reduce part of groupMapReduce).
This is an equivalent version performed in one pass through the sequence of chars of:
"hello".groupBy(identity).mapValues(_.map(_ => 1).reduce(_+_))