问题
With spark on yarn - I dont see a way to prevent concurrent jobs being scheduled. I have my architecture setup for doing purely batch processing.
I need this for the following reasons:
- Resource Constraints
- UserCache for spark grows really quickly. Having multiple jobs run causes an explosion of space on cache.
Ideally I'd love to see if there is a config that would ensure only one job to run at any time on Yarn.
回答1:
You can run create a queue which can host only one application master and run all Spark jobs on that queue. Thus, if a Spark job is running the other will be accepted but they won't be scheduled and running until the running execution has finished...
回答2:
Finally found the solution - was in yarn documents: yarn.scheduler.capacity.max-applications has to be set to 1 instead of 10000.
来源:https://stackoverflow.com/questions/36590856/spark-on-yarn-how-to-prevent-multiple-spark-jobs-being-scheduled