spark-1.4.1 saveAsTextFile to S3 is very slow on emr-4.0.0

后端 未结 1 1647
执念已碎
执念已碎 2020-12-11 04:24

I run spark 1.4.1 in amazom aws emr 4.0.0

For some reson spark saveAsTextFile is very slow on emr 4.0.0 in comparison to emr 3.8 (was 5 sec, now 95 sec)

Act

相关标签:
1条回答
  • 2020-12-11 04:59

    To solve the problem I added the following settings to mapred-site.xml as suggested by Neil Jonkers on user@spark.apache.org

    <property>
      <name>mapred.output.direct.EmrFileSystem</name>
      <value>true</value>
    </property>
    <property>
      <name>mapred.output.direct.NativeS3FileSystem</name>
      <value>true</value>
    </property>
    

    It can be done by adding the following to aws command

    classification=mapred-site,properties=[mapred.output.direct.EmrFileSystem=true,mapred.output.direct.NativeS3FileSystem=true]
    

    or by adding the following to configuration json file

      {
        "Classification": "mapred-site",
        "Properties": {
          "mapred.output.direct.EmrFileSystem": "true",
          "mapred.output.direct.NativeS3FileSystem": "true"
        }
      }
    
    0 讨论(0)
提交回复
热议问题