Meaning of Apache Spark warning “Calling spill() on RowBasedKeyValueBatch”

自作多情 提交于 2020-04-10 07:06:48

问题


I'm running a pyspark 2.2.0 job using the Apache Spark local mode and see the following warning:

WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.

What could be the reason for this warning? Is this something I should care about or can I safely ignore it?


回答1:


As indicated here this warning means that your RAM is full and that part of the RAM contents are moved to disk.

See also the Spark FAQ

Does my data need to fit in memory to use Spark?

No. Spark's operators spill data to disk if it does not fit in memory, allowing it to run well on any sized data. Likewise, cached datasets that do not fit in memory are either spilled to disk or recomputed on the fly when needed, as determined by the RDD's storage level.




回答2:


I guess this message is worse than a simple warning : it is on the edge of being an error.

Have a look at the source code :

 /**
   * Sometimes the TaskMemoryManager may call spill() on its associated MemoryConsumers to make
   * space for new consumers. For RowBasedKeyValueBatch, we do not actually spill and return 0.
   * We should not throw OutOfMemory exception here because other associated consumers might spill
   */
  public final long spill(long size, MemoryConsumer trigger) throws IOException {
    logger.warn("Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.");
    return 0;
  }

here : https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java

So I would say here you are on an infinite loop of "needing to spill but actually not spilling".



来源:https://stackoverflow.com/questions/46907447/meaning-of-apache-spark-warning-calling-spill-on-rowbasedkeyvaluebatch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!