Iterate over lines in a file in parallel (Scala)?

后端 未结 5 1543
遥遥无期
遥遥无期 2020-12-12 22:33

I know about the parallel collections in Scala. They are handy! However, I would like to iterate over the lines of a file that is too large for memory in parallel. I coul

5条回答
  •  -上瘾入骨i
    2020-12-12 23:05

    The comments on Dan Simon's answer got me thinking. Why don't we try wrapping the Source in a Stream:

    def src(source: Source) = Stream[String] = {
      if (source.hasNext) Stream.cons(source.takeWhile( _ != '\n' ).mkString)
      else Stream.empty
    }
    

    Then you could consume it in parallel like this:

    src(Source.fromFile(path)).par foreach process
    

    I tried this out, and it compiles and runs at any rate. I'm not honestly sure if it's loading the whole file into memory or not, but I don't think it is.

提交回复
热议问题