spark-1.4.1 saveAsTextFile to S3 is very slow on emr-4.0.0

后端 未结 1 1708
执念已碎
执念已碎 2020-12-11 04:24

I run spark 1.4.1 in amazom aws emr 4.0.0

For some reson spark saveAsTextFile is very slow on emr 4.0.0 in comparison to emr 3.8 (was 5 sec, now 95 sec)

Act

1条回答
  •  盖世英雄少女心
    2020-12-11 04:59

    To solve the problem I added the following settings to mapred-site.xml as suggested by Neil Jonkers on user@spark.apache.org

    
      mapred.output.direct.EmrFileSystem
      true
    
    
      mapred.output.direct.NativeS3FileSystem
      true
    
    

    It can be done by adding the following to aws command

    classification=mapred-site,properties=[mapred.output.direct.EmrFileSystem=true,mapred.output.direct.NativeS3FileSystem=true]
    

    or by adding the following to configuration json file

      {
        "Classification": "mapred-site",
        "Properties": {
          "mapred.output.direct.EmrFileSystem": "true",
          "mapred.output.direct.NativeS3FileSystem": "true"
        }
      }
    

    0 讨论(0)
提交回复
热议问题