My cluster: 1 master, 11 slaves, each node has 6 GB memory.
My settings:
spark.executor.memory=4g, Dspark.akka.frameSize=512
From my understanding of the code provided above, it loads the file and does map operation and saves it back. There is no operation that requires shuffle. Also, there is no operation that requires data to be brought to the driver hence tuning anything related to shuffle or driver may have no impact. The driver does have issues when there are too many tasks but this was only till spark 2.0.2 version. There can be two things which are going wrong.