Spark: Writing RDD Results to File System is Slow
问题 I'm developing a Spark application with Scala. My application consists of only one operation that requires shuffling (namely cogroup ). It runs flawlessly and at a reasonable time. The issue I'm facing is when I want to write the results back to the file system; for some reason, it takes longer than running the actual program. At first, I tried writing the results without re-partitioning or coalescing, and I realized that the number of generated files are huge, so I thought that was the issue