Running Spark app on EMR is slow
问题 I am new to Spark and MApReduce and I have a problem running Spark on Elastic Map Reduce (EMR) AWS cluster. Th problem is that running on EMR taking for me a lot of time. For, example, I have a few millions record in .csv file, that I read and converted in JavaRDD. For Spark, it took 104.99 seconds to calculate simple mapToDouble() and sum() functions on this dataset. While, when I did the same calculations without Spark, using Java8 and converting .csv file to List, it took only 0.5 seconds.