I have the following spark job, trying to keep everything in memory:
val myOutRDD = myInRDD.flatMap { fp =>
val tuple2List: ListBuffer[(String, myClass)] =
shuffle data
Shuffle write means those data which have written to your local file system in temporary cache location. In yarn cluster mode, you may set this property with attribute "yarn.nodemanager.local-dirs" in yarn-site.xml. Therefor, the "shuffle write" means the size of data which you've written to the temporary location; "Shuffle spill" is more likely your shuffle stage result. Anyway, those figure are accumulated.