Scala Infinite Iterator OutOfMemory

落花浮王杯 提交于 2020-01-02 05:33:08

问题


I'm playing around with Scala's lazy iterators, and I've run into an issue. What I'm trying to do is read in a large file, do a transformation, and then write out the result:

object FileProcessor {
  def main(args: Array[String]) {
    val inSource = Source.fromFile("in.txt")
    val outSource = new PrintWriter("out.txt")

    try {
      // this "basic" lazy iterator works fine
      // val iterator = inSource.getLines

      // ...but this one, which incorporates my process method, 
      // throws OutOfMemoryExceptions
      val iterator = process(inSource.getLines.toSeq).iterator

      while(iterator.hasNext) outSource.println(iterator.next)

    } finally {
      inSource.close()
      outSource.close()
    }
  }

  // processing in this case just means upper-cases every line
  private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
}

So I'm getting an OutOfMemoryException on large files. I know you can run afoul of Scala's lazy Streams if you keep around references to the head of the Stream. So in this case I'm careful to convert the result of process() to an iterator and throw-away the Seq it initially returns.

Does anyone know why this still causes O(n) memory consumption? Thanks!


Update

In response to fge and huynhjl, it seems like the Seq might be the culprit, but I don't know why. As an example, the following code works fine (and I'm using Seq all over the place). This code does not produce an OutOfMemoryException:

object FileReader {
  def main(args: Array[String]) {

  val inSource = Source.fromFile("in.txt")
  val outSource = new PrintWriter("out.txt")
  try {
    writeToFile(outSource, process(inSource.getLines.toSeq))
  } finally {
    inSource.close()
    outSource.close()
  }
}

@scala.annotation.tailrec
private def writeToFile(outSource: PrintWriter, contents: Seq[String]) {
  if (! contents.isEmpty) {
    outSource.println(contents.head)
    writeToFile(outSource, contents.tail)
  }
}

private def process(contents: Seq[String]) = contents.map(_.toUpperCase)

回答1:


As hinted by fge, modify process to take an iterator and remove the .toSeq. inSource.getLines is already an iterator.

Converting to a Seq will cause the items to be remembered. I think it will convert the iterator into a Stream and cause all items to be remembered.

Edit: Ok, it's more subtle. You are doing the equivalent of Iterator.toSeq.iterator by calling iterator on the result of process. This can cause an out of memory exception.

scala> Iterator.continually(1).toSeq.iterator.take(300*1024*1024).size
java.lang.OutOfMemoryError: Java heap space

It may be the same issue as reported here: https://issues.scala-lang.org/browse/SI-4835. Note my comment at the end of the bug, this is from personal experience.



来源:https://stackoverflow.com/questions/8640646/scala-infinite-iterator-outofmemory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!