Spark shuffle spill metrics

故事扮演 提交于 2021-02-18 06:54:09

问题


Running jobs on a spark 2.3 cluster, I noted in the spark webUI that spill occurs for some tasks :

I understand that on the reduce side, the reducer fetched the needed partitions (shuffle read), then performed the reduce computation using the execution memory of the executor. As there was not enough execution memory some data was spilled.

My questions:

  1. Am I correct ?
  2. Where the data is spilled ? Spark webUI states some data is spilled to memory shuffle spilled (memory), but nothing is spilled to disk shuffle spilled (disk)

Thanks in advance for your help

来源:https://stackoverflow.com/questions/51103971/spark-shuffle-spill-metrics

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!