I\'m trying to write a parquet file out to Amazon S3 using Spark 1.6.1. The small parquet that I\'m generating is
I also had this issue. Additional from what the rest said, here is a complete explanation from AWS: https://aws.amazon.com/blogs/big-data/improve-apache-spark-write-performance-on-apache-parquet-formats-with-the-emrfs-s3-optimized-committer/
During my experiment just changing to FileOutCommiter v2(from v1) improved the write 3-4x.
self.sc._jsc.hadoopConfiguration().set("mapreduce.fileoutputcommitter.algorithm.version", "2")