I run spark 1.4.1 in amazom aws emr 4.0.0
For some reson spark saveAsTextFile is very slow on emr 4.0.0 in comparison to emr 3.8 (was 5 sec, now 95 sec)
Act
To solve the problem I added the following settings to mapred-site.xml as suggested by Neil Jonkers on user@spark.apache.org
mapred.output.direct.EmrFileSystem
true
mapred.output.direct.NativeS3FileSystem
true
It can be done by adding the following to aws command
classification=mapred-site,properties=[mapred.output.direct.EmrFileSystem=true,mapred.output.direct.NativeS3FileSystem=true]
or by adding the following to configuration json file
{
"Classification": "mapred-site",
"Properties": {
"mapred.output.direct.EmrFileSystem": "true",
"mapred.output.direct.NativeS3FileSystem": "true"
}
}