Parallel iterator in Scala

前端 未结 4 1014
自闭症患者
自闭症患者 2020-12-06 00:43

Is it somehow possible, using Scala\'s parallel collections to parallelize an Iterator without evaluating it completely beforehand?

Here I am t

相关标签:
4条回答
  • 2020-12-06 01:22

    It's a bit hard to follow exactly what you're after, but perhaps it's something like this:

    val f = (x: Int) => x + 1
    val s = (0 to 9).toStream map f splitAt(6) match { 
      case (left, right) => left.par; right 
    }
    

    This will eveluate f on the first 6 elements in parallel and then return a stream over the rest.

    0 讨论(0)
  • 2020-12-06 01:26

    I realize that this is an old question, but does the ParIterator implementation in the iterata library do what you were looking for?

    scala> import com.timgroup.iterata.ParIterator.Implicits._
    scala> val it = (1 to 100000).toIterator.par().map(n => (n + 1, Thread.currentThread.getId))
    scala> it.map(_._2).toSet.size
    res2: Int = 8 // addition was distributed over 8 threads
    
    0 讨论(0)
  • 2020-12-06 01:36

    From the ML, Traversing iterator elements in parallel:

    https://groups.google.com/d/msg/scala-user/q2NVdE6MAGE/KnutOq3iT3IJ

    I moved off Future.traverse for a similar reason. For my use case, keeping N jobs working, I wound up with code to throttle feeding the execution context from the job queue.

    My first attempt involved blocking the feeder thread, but that risked also blocking tasks which wanted to spawn tasks on the execution context. What do you know, blocking is evil.

    0 讨论(0)
  • 2020-12-06 01:37

    Your best bet with the standard library is probably not using parallel collections but concurrent.Future.traverse:

    import concurrent._
    import ExecutionContext.Implicits.global
    Future.traverse(Iterator(1,2,3))(i => Future{ i*i })
    

    though I think this will execute the whole thing starting as soon as it can.

    0 讨论(0)
提交回复
热议问题