Spark on Yarn: How to prevent multiple spark jobs being scheduled

大憨熊 提交于 2019-12-11 11:32:58

问题


With spark on yarn - I dont see a way to prevent concurrent jobs being scheduled. I have my architecture setup for doing purely batch processing.

I need this for the following reasons:

  • Resource Constraints
  • UserCache for spark grows really quickly. Having multiple jobs run causes an explosion of space on cache.

Ideally I'd love to see if there is a config that would ensure only one job to run at any time on Yarn.


回答1:


You can run create a queue which can host only one application master and run all Spark jobs on that queue. Thus, if a Spark job is running the other will be accepted but they won't be scheduled and running until the running execution has finished...




回答2:


Finally found the solution - was in yarn documents: yarn.scheduler.capacity.max-applications has to be set to 1 instead of 10000.



来源:https://stackoverflow.com/questions/36590856/spark-on-yarn-how-to-prevent-multiple-spark-jobs-being-scheduled

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!