Getting error in Spark: Executor lost

独自空忆成欢 提交于 2019-12-04 15:32:06
Glenn Strycker

This isn't a Spark bug per-se, but is probably related to the settings you have for Java, Yarn, and your Spark-config file.

see http://apache-spark-user-list.1001560.n3.nabble.com/Executor-Lost-Failure-td18486.html

You'll want to increase your Java memory, increase you akka framesize, increase the akka timeout settings, etc.

Try the following spark.conf:

spark.master                       yarn-cluster
spark.yarn.historyServer.address   <your cluster url>
spark.eventLog.enabled             true
spark.eventLog.dir                 hdfs://<your history directory>
spark.driver.extraJavaOptions      -Xmx20480m -XX:MaxPermSize=2048m -XX:ReservedCodeCacheSize=2048m
spark.checkpointDir                hdfs://<your checkpoint directory>
yarn.log-aggregation-enable        true
spark.shuffle.service.enabled      true
spark.shuffle.service.port         7337
spark.shuffle.consolidateFiles     true
spark.sql.parquet.binaryAsString   true
spark.speculation                  false
spark.yarn.maxAppAttempts          1
spark.akka.askTimeout              1000
spark.akka.timeout                 1000
spark.akka.frameSize               1000
spark.rdd.compress true
spark.storage.memoryFraction 1
spark.core.connection.ack.wait.timeout 600
spark.driver.maxResultSize         0
spark.task.maxFailures             20
spark.shuffle.io.maxRetries        20

You might also want to play around with how many partitions you are requesting inside you Spark program, and you may want to add some partitionBy(partitioner) statements to your RDDs, so your code might be this:

myPartitioner = new HashPartitioner(<your number of partitions>)

rdd = sc.textFile("<path/to/file>").partitionBy(myPartitioner)
h = rdd.first()
header_rdd = rdd.map(lambda l: h in l)
data_rdd = rdd.subtract(header_rdd)
data_rdd.first()

Finally, you may need to play around with your spark-submit command and add parameters for number of executors, executor memory, and driver memory

./spark-submit --master yarn --deploy-mode client --num-executors 100 --driver-memory 20G --executor-memory 10g <path/to/.py file>

I got an executor lost error because I was using the sc.wholeTextFiles() call and one of my input files was large at 149M. It caused the executor to fail. I don't think that 149M is actually very large but it caused it to fail.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!