SLURM `srun` vs `sbatch` and their parameters

前端 未结 2 1118
伪装坚强ぢ
伪装坚强ぢ 2020-12-22 17:37

I am trying to understand what the difference is between SLURM\'s srun and sbatch commands. I will be happy with a general explanation, rather than specific answers to the f

2条回答
  •  無奈伤痛
    2020-12-22 17:44

    This doesn't actually fully answer the question, but here is some more information I found that may be helpful for someone in the future:


    From a related thread I found with a similar question:

    In a nutshell, sbatch and salloc allocate resources to the job, while srun launches parallel tasks across those resources. When invoked within a job allocation, srun will launch parallel tasks across some or all of the allocated resources. In that case, srun inherits by default the pertinent options of the sbatch or salloc which it runs under. You can then (usually) provide srun different options which will override what it receives by default. Each invocation of srun within a job is known as a job step.

    srun can also be invoked outside of a job allocation. In that case, srun requests resources, and when those resources are granted, launches tasks across those resources as a single job and job step.

    There's a relatively new web page which goes into more detail regarding the -B and --exclusive options.

    doc/html/cpu_management.shtml


    Additional information from the SLURM FAQ page.

    The srun command has two different modes of operation. First, if not run within an existing job (i.e. not within a Slurm job allocation created by salloc or sbatch), then it will create a job allocation and spawn an application. If run within an existing allocation, the srun command only spawns the application. For this question, we will only address the first mode of operation and compare creating a job allocation using the sbatch and srun commands.

    The srun command is designed for interactive use, with someone monitoring the output. The output of the application is seen as output of the srun command, typically at the user's terminal. The sbatch command is designed to submit a script for later execution and its output is written to a file. Command options used in the job allocation are almost identical. The most noticable difference in options is that the sbatch command supports the concept of job arrays, while srun does not. Another significant difference is in fault tolerance. Failures involving sbatch jobs typically result in the job being requeued and executed again, while failures involving srun typically result in an error message being generated with the expectation that the user will respond in an appropriate fashion.


    Another relevant conversation here

提交回复
热议问题