On the semantics of `srun … >output_file` for parallel tasks

落爺英雄遲暮 提交于 2019-12-13 14:21:50

问题


Sorry, this question requires a lot of build-up, but in summary, it's about the conditions under which many parallel instances of srun ... >output_file will or won't lead to the clobbering by some process/task of the output produced by some other process/task.


CASE 0: bash only (no SLURM)

Suppose that prog-0.sh is the following toy script:

#!/bin/bash

hostname >&2

if [[ $JOB_INDEX = 0 ]]
then
    date
fi

This script prints some output to stderr, and possibly prints the current date to stdout.

The "driver" script case-0.sh shown below spawns $NJOBS processes, all writing to prog-0-stdout.txt:

#!/bin/bash

for i in $( seq 0 $(( NJOBS - 1 )) )
do  
    JOB_INDEX=$i ./prog-0.sh >prog-0-stdout.txt &
done

After running

% NJOBS=100 ./case-0.sh 2>prog-0-stderr.txt

...my expectation is that prog-0-stderr.txt will contain 100 lines, and that prog-0-stdout.txt will be empty.

My expectation pans out:

 % wc prog-0-std*.txt
  100  100 3000 prog-0-stderr.txt
    0    0    0 prog-0-stdout.txt
  100  100 3000 total

The explanation for these results is that, when NJOBS is sufficiently large, it is likely that, for some sufficiently high value of $i, the redirection >prog-0-stdout.txt will be evaluated after the "designated job", the one JOB_INDEX 0 (and the only one that sends output to stdout) has written the date to stdout, and this will therefore clobber whatever output was earlier redirected by the "designated job" to prog-0-stdout.txt.

BTW, the value of NJOBS needs to be high enough for the results to be as I've just described. For example, if I use NJOBS=2:

% NJOBS=2 ./case-0.sh 2>prog-0-stderr.txt

...then not only will prog-0-stderr.txt contain only 2 lines (not surprisingly), but prog-0-stdout.txt will contain a date:

% cat prog-0-stdout.txt
Wed Oct  4 15:02:49 EDT 2017

In this case, all the >prog-0-stdout.txt redirections have been evaluated before the designated job prints the date to prog-0-stdout.txt.


CASE 1: SLURM job arrays

Now, consider a very similar scenario, but using SLURM instead. The script prog-1.sh is identical to prog-0.sh, except that it examines a different variable to decide whether or not to print the date to stdout:

#!/bin/bash

hostname >&2

if [[ $SLURM_ARRAY_TASK_ID = 0 ]]
then
    date
fi

And here's the corresponding "driver" script, case-1.sh:

#!/bin/bash
#SBATCH -t 1
#SBATCH -p test

#SBATCH -e prog-1-%02a-stderr.txt
#SBATCH -n 1
#SBATCH -a 0-99

srun ./prog-1.sh >prog-1-stdout.txt

Like case-0.sh, this script redirects the output of its main step to a single file ./prog-1-stdout.txt.

Importantly, this same file will be seen by all the nodes that run ./prog-1.sh for this job.

If I now run

sbatch case-1.sh

...I get 100 files prog-1-00-stderr.txt ... prog-1-99-stderr.txt, containing 1 line each, and an empty prog-1-stdout.txt. I assume that the earlier explanation also explains why prog-1-stdout.txt is empty.

So far so good.


CASE 2: SLURM tasks

Finally, consider one more SLURM-based case, this time using the core script prog-2.sh and the driver script case-2.sh. Again, the only change in prog-2.sh is the variable it examines to decide whether or not to print the date to stdout:

#!/bin/bash

hostname >&2

if [[ $SLURM_PROCID = 1 ]]
then
    date
fi

And here is case-2.sh:

#!/bin/bash
#SBATCH -t 1
#SBATCH -p test

#SBATCH -e prog-2-stderr.txt
#SBATCH -N 10
#SBATCH --tasks-per-node=10

srun -l ./prog-2.sh >prog-2-stdout.txt

As before, prog-2-stdout.txt is visible by all the nodes handling the job.

Now, if I run sbatch case-2.sh and wait for the batch job to finish, then prog-2-stderr.txt contains 100 lines (as expected), but, to my surprise, prog-2-stdout.txt is not empty. In fact, it contains a date:

% cat prog-2-stdout.txt
01: Wed Oct  4 15:21:17 EDT 2017

The only explanation I can come up with is analogous to the one I gave earlier for the results I got when I ran

% NJOBS=2 ./case-0.sh 2>prog-0-stderr.txt

If this explanation is correct, my concern is that the fact case-2.sh worked better than expected (i.e. prog-2-stdout.txt ends up with the right output) is just a coincidence, having to do with the relative timing of concurrent events.


Now, at long last, my question is:

Q: does SLURM guarantee that a prog-2-stdout.txt file that contains the output generated by the designated task (i.e. the one that prints the date to stdout) will not be clobbered when the >prog-2-stdout.txt redirection gets evaluated by one of the non-designated tasks?


回答1:


You have a misconception on how srun works. In CASE 1 the usage of srun is irrelevant as it's used in batch scripts to start parallel jobs. In CASE 1 you only have one task, so

srun ./prog-1.sh >prog-1-stdout.txt is equivalent to:

./prog-1.sh >prog-1-stdout.txt

CASE 2 is different, as you have more than 1 task. In that case, srun -l ./prog-2.sh >prog-2-stdout.txt is only evaluated once, and srun will take care of spawning 10*10 tasks. srun will redirect the output of all the tasks to the master node of the job, and it will be the one writing to prog-2-stdout.txt.

So you can be sure that in this case there will be no clobbering of the output file as it is evaluated only once.



来源:https://stackoverflow.com/questions/46574606/on-the-semantics-of-srun-output-file-for-parallel-tasks

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!