Scala immutable Map slow

时光毁灭记忆、已成空白 提交于 2019-12-04 12:58:11

(0 to 2000000).toList and (0 to 2000000).map(x => x -> x).toMap have similar performance if you give them enough memory (I tried -Xmx4G - 4 Gigabytes). toMap implementation is a lot about cloning, so a lot of memory is being "allocated"/"deallocated". So, in case of memory starvation GC is becoming overactive.

When I tried to run (0 to 2000000).toList with 128Mb - it took several seconds, but (0 to 2000000).map(x => x -> x).toMap took at least 2 minutes with 10% GC activity (VisualVM), and died with out of memory.

However, when I tried -Xmx4G both were pretty fast.


P.S. What toMap does is repeatedly adding an element to a prefix tree, so it has to clone (Array.copy) a lot per every element: https://github.com/scala/scala/blob/99a82be91cbb85239f70508f6695c6b21fd3558c/src/library/scala/collection/immutable/HashMap.scala#L321.

So, toMap is repeatedly (2000000 times) doing updated0, which in its turn doing an Array.copy pretty often, which requires lots of memory allocations, which (in low-memory case) causes GC to go MarkAndSweep (slow garbage collection) most of the time (as far as I can see from jconsole).


Solution: Whether increase the memory (-Xmx/-Xms JVM parameters) or if you need more complex operations on your data-set use something like Apache Spark (or any batch-oriented map-reduce framework) to process your data in a distributed way.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!