How do the terms “job”, “task”, and “step” relate to each other?

落花浮王杯 提交于 2020-04-27 17:32:19

问题


How do the terms "job", "task", and "step" as used in the SLURM docs relate to each other?

AFAICT, a job may consist of multiple tasks, and also it make consist of multiple steps, but, assuming this is true, it's still not clear to me how tasks and steps relate.

It would be helpful to see an example showing the full complexity of jobs/tasks/steps.


回答1:


A job consists in one or more steps, each consisting in one or more tasks each using one or more CPU.

Jobs are typically created with the sbatch command, steps are created with the srun command, tasks are requested (at the job level or the step level) with --ntasks and CPUs are requested per task with --cpus-per-task. Note that jobs submitted with sbatch have one implicit step; the Bash script itself.

Assume the hypothetical job:

#SBATCH --nodes 8
#SBATCH --tasks-per-node 8
# The job requests 64 CPUs, on 8 nodes.    

# First step, with a sub-allocation of 8 tasks (one per node) to create a tmp dir. 
# No need for more than one task per node, but it has to run on every node
srun --nodes 8 --tasks 8 mkdir -p /tmp/$USER/$SLURM_JOBID

# Second step with the full allocation (64 tasks) to run an MPI 
# program on some data to produce some output.
srun process.mpi <input.dat >output.txt

# Third step with a sub allocation of 48 tasks (because for instance 
# that program does not scale as well) to post-process the output and 
# extract meaningful information
srun --ntasks 48 --nodes 6 --exclusive postprocess.mpi <output.txt >result.txt &

# Four step with a sub-allocation on a single node (because maybe 
# it is a multithreaded program that cannot use CPUs on distinct nodes)    
# to compress the raw output. This step runs at the same time as 
# the previous one thanks to the ampersand `&` 
OMP_NUM_THREAD=12 srun --ntasks 12 --nodes 1 --exclusive compress output.txt &

wait

Four steps were created and so the accounting information for that job will have 5 lines; one per step plus one for the Bash script itself.



来源:https://stackoverflow.com/questions/46506784/how-do-the-terms-job-task-and-step-relate-to-each-other

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!