Questions on alternative ways to run 4 parallel jobs

前提是你 提交于 2019-12-04 18:51:25
damienfrancois

Q: Why would one choose one such approach over the others?

Script 0: you request 4 tasks, to be allocated at the same time to a single job, with no other specification as to how those tasks should be allocated to nodes. Typical use: an MPI program.

Script 1: you request 4 jobs, each with 1 task. The jobs will be scheduled independently one from another. Typical use: Embarrassingly parallel jobs.

Script 2: you request 4 nodes, with one task per node. It is similar to Script 0 except that you request the tasks to be allocated to four distinct nodes. Typical use: MPI program with a lot of IOs on local disks for instance.

The fact that all jobs were allocated the same first node is due to the fact that Slurm always allocates the nodes in the same order, and you probably run all the tests one after another so the other started on the resources the previous one just freed.

Script 3: You request two nodes, with implicitly, 1 task per node, so you are allocated two tasks, but then you try to use 4 tasks with srun. You should change it to

#SBATCH -N 2
#SBATCH --tasks-per-node 2

srun -l -n 4 hostname -s

two request two tasks per node, or

#SBATCH -N 2
#SBATCH -n 4

srun -l -n 4 hostname -s

to request four tasks, with no additional constraint on the distribution of tasks across nodes.

Script 4: You request two nodes, with implicitly, 1 task per node, and, also implicitly, one CPU per task, so you are allocated two CPUs, but then you try to use 4 tasks with srun, each with 2 CPUS so 8 in total. You should change it to

#SBATCH -N 2
#SBATCH --tasks-per-node 2
#SBATCH --cpus-per-task 2    

srun -l -n 4 -c 2 hostname -s

or,

#SBATCH -N 2
#SBATCH -n 4
#SBATCH --cpus-per-task 2    

srun -l -n 4 -c 2 hostname -s

The bottom line: in the submission script, you request resources with the #SBATCH directives, and you cannot use more resource than than in the subsequent calls to srun.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!