Spark: Inconsistent performance number in scaling number of cores
I am doing a simple scaling test on Spark using sort benchmark -- from 1 core, up to 8 cores. I notice that 8 cores is slower than 1 core. //run spark using 1 core spark-submit --master local[1] --class john.sort sort.jar data_800MB.txt data_800MB_output //run spark using 8 cores spark-submit --master local[8] --class john.sort sort.jar data_800MB.txt data_800MB_output The input and output directories in each case, are in HDFS. 1 core: 80 secs 8 cores: 160 secs I would expect 8 cores performance to have x amount of speedup. user6910411 Theoretical limitations I assume you are familiar Amdahl's