Using Spark to write a parquet file to s3 over s3a is very slow

前端 未结 4 1013
闹比i
闹比i 2020-12-04 18:16

I\'m trying to write a parquet file out to Amazon S3 using Spark 1.6.1. The small parquet that I\'m generating is

4条回答
  •  我在风中等你
    2020-12-04 19:06

    I also had this issue. Additional from what the rest said, here is a complete explanation from AWS: https://aws.amazon.com/blogs/big-data/improve-apache-spark-write-performance-on-apache-parquet-formats-with-the-emrfs-s3-optimized-committer/

    During my experiment just changing to FileOutCommiter v2(from v1) improved the write 3-4x.

    self.sc._jsc.hadoopConfiguration().set("mapreduce.fileoutputcommitter.algorithm.version", "2")
    

提交回复
热议问题