Spark Job error: YarnAllocator: Exit status: -100. Diagnostics: Container released on a *lost* node

蓝咒 提交于 2019-12-09 06:05:47

问题


I am running a job on AWS-EMR 4.1, Spark 1.5 with the following conf:

spark-submit --deploy-mode cluster --master yarn-cluster --driver-memory 200g --driver-cores 30 --executor-memory 70g --executor-cores 8 --num-executors 90 --conf spark.storage.memoryFraction=0.45 --conf spark.shuffle.memoryFraction=0.75 --conf spark.task.maxFailures=1 --conf spark.network.timeout=1800s

Then I got the error below. Where can I find out what is "Exit status: -100" ? And how I might be able to fix this problem? Thanks!


15/12/05 05:54:24 INFO TaskSetManager: Finished task 176.0 in stage 957.0 (TID 128408) in 130885 ms on ip-10-155-195-239.ec2.internal (106/800)
15/12/05 05:54:24 INFO YarnAllocator: Completed container container_1449241952863_0004_01_000026 (state: COMPLETE, exit status: -100)
15/12/05 05:54:24 INFO YarnAllocator: Container marked as failed: container_1449241952863_0004_01_000026. Exit status: -100. Diagnostics: Container released on a *lost* node
15/12/05 05:54:24 INFO YarnAllocator: Completed container container_1449241952863_0004_01_000055 (state: COMPLETE, exit status: -100)
15/12/05 05:54:24 INFO YarnAllocator: Container marked as failed: container_1449241952863_0004_01_000055. Exit status: -100. Diagnostics: Container released on a *lost* node
15/12/05 05:54:24 ERROR YarnClusterScheduler: Lost executor 24 on ip-10-147-11-212.ec2.internal: Yarn deallocated the executor 24 (container container_1449241952863_0004_01_000026)
15/12/05 05:54:24 INFO TaskSetManager: Re-queueing tasks for 24 from TaskSet 957.0
15/12/05 05:54:24 WARN TaskSetManager: Lost task 382.0 in stage 957.0 (TID 128614, ip-10-147-11-212.ec2.internal): ExecutorLostFailure (executor 24 lost)
15/12/05 05:54:24 ERROR TaskSetManager: Task 382 in stage 957.0 failed 1 times; aborting job
15/12/05 05:54:24 WARN TaskSetManager: Lost task 208.0 in stage 957.0 (TID 128440, ip-10-147-11-212.ec2.internal): ExecutorLostFailure (executor 24 lost)

来源:https://stackoverflow.com/questions/34102083/spark-job-error-yarnallocator-exit-status-100-diagnostics-container-releas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!