S3Guard and parquet magic commiter for S3A on EMR 6.x
问题 We are using CDH 5.13 with Spark 2.3.0 and S3Guard. After running the same job on EMR 5.x / 6.x with the same resources we got 5-20x performance degradation. According to the https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-committer-reqs.html default committer(since 5.20) is not good for S3A. We tested EMR-5.15.1 and got the same results as on Hadoop. If I am trying to use Magic Commiter I am getting py4j.protocol.Py4JJavaError: An error occurred while calling o72.save. : java