Spark: Difference between Shuffle Write, Shuffle spill (memory), Shuffle spill (disk)?

前端 未结 3 676
暖寄归人
暖寄归人 2021-02-01 17:29

I have the following spark job, trying to keep everything in memory:

val myOutRDD = myInRDD.flatMap { fp =>
  val tuple2List: ListBuffer[(String, myClass)] =          


        
3条回答
  •  时光说笑
    2021-02-01 18:05

    shuffle data

    Shuffle write means those data which have written to your local file system in temporary cache location. In yarn cluster mode, you may set this property with attribute "yarn.nodemanager.local-dirs" in yarn-site.xml. Therefor, the "shuffle write" means the size of data which you've written to the temporary location; "Shuffle spill" is more likely your shuffle stage result. Anyway, those figure are accumulated.

提交回复
热议问题