Meaning of Apache Spark warning “Calling spill() on RowBasedKeyValueBatch”
问题 I'm running a pyspark 2.2.0 job using the Apache Spark local mode and see the following warning: WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0. What could be the reason for this warning? Is this something I should care about or can I safely ignore it? 回答1: As indicated here this warning means that your RAM is full and that part of the RAM contents are moved to disk. See also the Spark FAQ Does my data need to fit in memory to use Spark? No.