Spark 1.6 on EMR writing to S3 as Parquet hangs and fails
I'm creating an uber jar spark application that I'm spark submitting to an EMR 4.3 cluster, I'm provisioning 4 r3.xlarge instances, one to be the master and the other three as the cores. I have hadoop 2.7.1, ganglia 3.7.2 spark 1.6, and hive 1.0.0 pre-installed from the console. I'm running the following command: spark-submit \ --deploy-mode cluster \ --executor-memory 4g \ --executor-cores 2 \ --num-executors 4 --driver-memory 4g --driver-cores 2 --conf "spark.driver.maxResultSize=2g" --conf "spark.hadoop.spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet