I\'m running Apache Spark 1.3.1 on Scala 2.11.2, and when running on an HPC cluster with large enough data, I get numerous errors like the ones at the bottom of my post (rep
This appears to be a bug related to the Netty
networking system (block transfer service), added in Spark 1.2. Adding .set("spark.shuffle.blockTransferService", "nio")
to my SparkConf fixed the bug, so now everything works perfectly.
I found a post on the spark-user mailing list from someone that was running into similar errors, and they suggested trying nio
instead of Netty
.
SPARK-5085 is similar, in that changing from Netty
to nio
fixed their issue; however, they were also able to fix the issue by changing some networking settings. (I didn't try this yet myself, since I'm not sure I have the right access privileges to do so on the cluster.)
It's also possible that your Maven config is different from your Spark server installation.
For example your picked a pom.xml from an blog post Tutorial
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_1.3</artifactId>
<version>1.3</version>
</dependency>
</dependencies>
But you could have downloaded the latest 2.3 version on Apache Spark website.