Spark 1.6 DirectFileOutputCommitter
问题 I am having a problem saving text files to S3 using pyspark. I am able to save to S3, but it first uploads to a _temporary on S3 and then proceeds to copy to the intended location. This increases the jobs run time significantly. I have attempted to compile a DirectFileOutputComitter which should write directly to the intended S3 url, but I cannot get Spark to utilize this class. Example: someRDD.saveAsTextFile("s3a://somebucket/savefolder") this creates a s3a://somebucket/savefolder/