Recently we migrated from \"EMR on HDFS\" --> \"EMR on S3\" (EMRFS with consistent view enabled) and we realized the Spark \'SaveAsTable\' (parquet format) writes to S3 were ~4x
I think the S3 committer from Netflix is already open sourced at: https://github.com/rdblue/s3committer.