Efficiency/scalability of parallel collections in Scala (graphs)

人走茶凉 提交于 2019-12-04 17:03:10

From your comments, I see that your updating a shared mutable HashMap at the end of each algorithm run. And if you're randomizing your walks, a shared Random is also a contention point.

I recommend two changes:

  1. Use .map and .flatMap to return an immutable collection instead of modifying a shared collection.
  2. Use a ThreadLocalRandom (from either Akka or Java 7) to reduce contention on the random number generator
  3. Check the rest of your algorithm for further possible contention points.
  4. You may try running the inner loop in parallel, too. But without knowing your algorithm, it's hard to know if that will help or hurt. Fortunately, running all combinations of parallel and sequential collections is very simple; just switch out pVertexList and vertexList in the code below.

Something like this:

val pVertexList = vertexList.par
val allResult = for {
  u <- pVertexList
  v <- pVertexList
} yield {
  /* Run algorithm between u and v */
  ((u -> v) -> result)
}

The value allResult will be a ParVector[((Int, Int), Int)]. You may call .toMap on it to convert that into a Map.

Daniel C. Sobral

Why mutable? I don't think there's a good parallel mutable map on Scala 2.9.x -- particularly because just such a data structure was added to the upcoming Scala 2.10.

On the other hand... you have a List[Int]? Don't use that, use a Vector[Int]. Also, are you sure you aren't wasting time elsewhere, doing the conversions from your mutable maps and buffers into immutable lists? Scala data structures are different than C++'s so you might well be incurring in complexity problems elsewhere in the code.

Finally, I think dave might be onto something when he asks about contention. If you have contention, parallelism might well make things slower. How faster/slower does it run if you do not make it parallel? If making it not parallel makes it faster, then you most likely do have contention issues.

I'm not completely sure about it, but I think foreach loops in foreach loops are rather slow, because lots of objects get created. See: http://scala-programming-language.1934581.n4.nabble.com/for-loop-vs-while-loop-performance-td1935856.html

Try rewriting it using a while loop.

Also Lists are only efficient for head access, Arrays are probably faster.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!