问题
So I have a sbatch (slurm job scheduler) script in which I am processing a lot of data through 3 scripts: foo1.sh, foo2.sh and foo3.sh.
foo1.sh and foo2.sh are independent and I want to run them simultaneously. foo3.sh needs the outputs of foo1.sh and foo2.sh so I am building a dependency. And then I have to repeat it 30 times.
Let say:
## Resources config
#SBATCH --ntasks=30
#SBATCH --task-per-core=1
for i in {1..30};
do
    srun -n 1 --jobid=foo1_$i ./foo1.sh &
    srun -n 1 --jobid=foo2_$i ./foo2.sh &
    srun -n 1 --jobid=foo3_$i --dependency=afterok:foo1_$1:foo2_$i ./foo3.sh &
done;
wait
The idea being that you launch foo1_1 and foo2_1 but since foo3_1 have to wait for the two other jobs to finish, I want to go to the next iteration. The next iteration is going to launch foo1_2 foo2_2 and foo3_2 will wait etc.
At some point, then, the number of subjobs launched with srun will be higher than --ntasks=30. What is going to happen? Will it wait for a previous job to finish (behavior I am looking for)?
Thanks
回答1:
Slurm will run 30 srun's but the 31st will wait that a core get freed within your 30-cores allocation.
note that the proper argument is --ntasks-per-core=1, and not --tasks-per-core=1
You can test it by yourself using salloc rather than sbatch to work interactively:
$ salloc --ntasks=2 --ntasks-per-core=1
$ srun -n 1 sleep 10 & srun -n 1 sleep 10 & time srun -n 1 echo ok
[1] 2734
[2] 2735
ok
[1]-  Done                    srun -n 1 sleep 10
[2]+  Done                    srun -n 1 sleep 10
real    0m10.201s
user    0m0.072s
sys 0m0.028s
You see that the simple echo took 10 seconds because the third srun had to wait until the first two have finished as the allocation is two cores only.
回答2:
What should happen is, if you kick-off more subtasks than you have cores or hyperthreads, then the OS scheduling algorithms should handle prioritizing the tasks. Depending on which OS you are running (even if they are all Unix based), the way this is implemented under the hood will be different.
But you are correct in your assumption that if you run out of cores, then your parallel tasks must, in a sense, 'wait their turn'.
来源:https://stackoverflow.com/questions/26812354/what-happens-if-i-am-running-more-subjobs-than-the-number-of-core-allocated