Generating a frequency map for a string in Scala

后端 未结 4 1149
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-29 04:46

Let\'s say I have a string, \"hello\", and I want to generate a character frequency map:

Map[Char,Int] = Map(h -> 1, e -> 1, o -> 1, l -> 2)
         


        
4条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-29 05:08

    1) What does p mean?

    groupBy takes a function which maps an elements to a key of type K. When invoked on some collection Coll, it returns a Map[K, Coll] which contains mappings from keys K to all the elements which mapped to the same key.

    So, in your case, str.groupBy(_.toChar) yields a map mapping from a key k (which is a character) to a string with all the elements (characters) c such that k == c.toChar. You get this:

    Map(e -> "e", h -> "h", l -> "ll", o -> "o")
    

    A Map is an iterable of pairs of keys and values. In this case, each pair is a character and a string of elements. Calling the map operation on a Map involves mapping on these pairs - p is a pair where p._1 is a character, and p._2 is the associated string (on which you can call length, as you did above).

    2) How to do this idiomatically

    The above is how to do it idiomatically - using groupBy and map. Alternatively, you can use an immutable map and recursion on the string length to compute the frequencies, or an immutable map and a foldLeft.

    3) Performance characteristic

    Best to benchmark to see the differences. Here are a couple of microbenchmark for a highly-repetitive string (~3GHz iMac, JDK7, Scala 2.10.0 nightly):

    object Imperative extends testing.Benchmark {
      val str = "abc" * 750000
    
      def run() {
        var counts = new scala.collection.mutable.HashMap[Char,Int]
        var i = 0
        val until = str.length
        while (i < until) {
          var c = str(i)
          if (counts.contains(c))
            counts.put(c, counts(c) + 1)
          else
            counts.put(c, 1)
          i += 1
        }
    
        //println(f)
      }
    }
    
    
    object Combinators extends testing.Benchmark {
      val str = "abc" * 750000
    
      def run() {
        val f = str.groupBy(_.toChar).map(p => (p._1, p._2.length))
      }
    }
    
    
    object Fold extends testing.Benchmark {
      val str = "abc" * 750000
    
      def run() {
        val f = str.foldLeft(Map[Char, Int]() withDefaultValue 0){(h, c) => h.updated(c, h(c)+1)}
      }
    }
    

    Results:

    • Imperative: $ 103 57 53 58 53 53 53 53 53 53

    • Combinators: $ 72 51 63 56 53 52 52 54 53 53

    • Fold: $ 163 62 71 62 57 57 57 58 57 57

    Note that changing the imperative version to use withDefaultValue:

    var counts = new scala.collection.mutable.HashMap[Char,Int].withDefaultValue(0)
    var i = 0
    val until = str.length
    while (i < until) {
      var c = str(i)
      counts.put(c, counts(c) + 1)
      i += 1
    }
    

    is apparently terribly slow due to forwarding each put call:

    • withDefaultValue: $ 133 87 109 106 101 100 101 100 101 101

    Conclusion: the boxing and unboxing of characters in this case is high-enough so that the differences in performance between these approaches are hard to observe.

    EDIT:

    Update: You may want to use ScalaMeter inline benchmarking in place of the Benchmark trait.

提交回复
热议问题