Scala map() on a Map[..] much slower than mapValues()

泄露秘密 提交于 2019-12-23 23:41:10

问题


In a Scala program I wrote I have a scala.collection.Map that maps a String to some calculated values (in detail it's Map[String, (Double, immutable.Map[String, Double], Double)] - I know that's ugly and should (and will be) wrapped). Now, if I do this:

stats.map { case(c, (prior, pwc, denom)) => {
  println(c)
  ...
  }
}

it takes about 30 seconds to print out roughly 50 times a value of c! The println is just a test statement - the actual calculation I need was even slower (I aborted after 1 minute of complete silence). However, if I do it like this:

stats.mapValues { case (prior, pwc, denom) => {
  println(prior)
  ...
  }
}

I don't run into these performance issues ... Can anyone explain why this is happening? Am I not following some important Scala guidelines?

Thanks for the help!

Edit:

I further investigated the behaviour. My guess is that the bottleneck comes from accessin the Map datastructure. If I do the following, I have have the same performance issues:

classes.foreach{c => {
  println(c)
  val ps = stats(c)
  }
}

Here classes is a List[String] that stores the keys of the Map externally. Without the access to stats(c) no performance losses occur.


回答1:


mapValues actually returns a view on the original map, which can lead to unexpected performance issues. From this blog post:

...here is a catch: map and mapValues are different in a not-so-subtle way. mapValues, unlike map, returns a view on the original map. This view holds references to both the original map and to the transformation function (here (_ + 1)). Every time the returned map (view) is queried, the original map is first queried and the tranformation function is called on the result.

I recommend reading the rest of that post for some more details.



来源:https://stackoverflow.com/questions/26982743/scala-map-on-a-map-much-slower-than-mapvalues

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!