问题
I've tried to write a parallel Mergesort using Scala Futures. However, when I run my algorithm on a list of size 100 000 inside Eclipse's interpreter everything gets very sluggish and eventually I get an error message telling me I'm out of memory. When I run it in the interpreter from the command line it hangs already at lists of size 10 000 (but now I get no error messages).
Why does this happen and is there a fix?
import scala.actors.Future
import scala.actors.Futures._
object MergeSort{
def sort[T <% Ordered[T]](toBeSorted :List[T]) :List[T] = toBeSorted match{
case Nil => Nil
case List(x) => List(x)
case someList =>
val (left, right) = someList splitAt someList.length/2
val sortedLeft = future { sort(left) }
val sortedRight = sort(right)
merge(sortedLeft(), sortedRight, Nil)
}
def merge[T <% Ordered[T]](a :List[T], b :List[T], Ack: List[T]) :List[T] = (a, b) match {
case (Nil, ys) => Ack.reverse ++ ys
case (xs, Nil) => Ack.reverse ++ xs
case (x::xs, y::ys) if x < y => merge(xs, y::ys, x::Ack)
case (x::xs, y::ys) => merge(x::xs, ys, y::Ack)
}
}
回答1:
You should try using the Akka future and tweaking the ExecutionContext according to your needs:
- http://doc.akka.io/docs/akka/2.0.1/scala/futures.html
It looks like the std-lib doesn't give you good defaults for use-case like that.
回答2:
As Rex pointed out, the overhead of (any) Future API is sizable and shall not be ignored.
Don't waste the precious cpu and memory on context switch overhead. You should split your list into chunks in reasonable sizes and perform sorting in the same thread.
For example, if you have 4 cores on your machine and 4GB memory. You can split it into 500MB chunks and run upto 4 merge sort simultaneously. This would maximum your throughput and parallelism.
You can use SIP-14's ExecutionContext to limit numbers of thread used.
private val GLOBAL_THREAD_LIMIT = Runtime.getRuntime.availableProcessors()
private lazy implicit val executionContext =
ExecutionContext.fromExecutorService(
Executors.newFixedThreadPool(GLOBAL_THREAD_LIMIT)
)
By the way, I have implemented a parallel external merge sort in SIP-14. I have explained the implementation details on my blog: http://blog.yunglinho.com/blog/2013/03/19/parallel-external-merge-sort/
来源:https://stackoverflow.com/questions/15589502/scala-parallel-mergesort-out-of-memory