问题
I am queuing multiple jobs in SLURM. Can I limit the number of parallel running jobs in slurm?
Thanks in advance!
回答1:
If you are not the administrator, your can hold
some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID>
, and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD
. Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25
to have 100 jobs in the array but only 25 of them running.
回答2:
According to the SLURM Resource Limits documentation, you can limit the total number of jobs that you can run for an association/qos with the MaxJobs
parameter. As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.
You should be able to do something similar to:
sacctmgr modify user <userid> account=<account_name> set MaxJobs=10
I found this presentation to be very helpful in case you have more questions.
回答3:
According to SLURM documentation, --array=0-15%4
(- sign and not :) will limit the number of simultaneously running tasks from this job array to 4
I wrote test.sbatch:
#!/bin/bash
# test.sbatch
#
#SBATCH -J a
#SBATCH -p campus
#SBATCH -c 1
#SBATCH -o %A_%a.output
mkdir test${SLURM_ARRAY_TASK_ID}
# sleep for up to 10 minutes to see them running in squeue and
# different times to check that the number of parallel jobs remain constant
RANGE=600; number=$RANDOM; let "number %= $RANGE"; echo "$number"
sleep $number
and run it with sbatch --array=1-15%4 test.sbatch
Jobs run as expected (always 4 in parallel) and just create directories and kept running for $number seconds.
Appreciate comments and suggestions.
回答4:
If your jobs are relatively similar you can use the slurm array functions. I had been trying to figure this out for a while and found this solution at https://docs.id.unibe.ch/ubelix/job-management-with-slurm/array-jobs-with-slurm
#!/bin/bash -x
#SBATCH --mail-type=NONE
#SBATCH --array=1-419%25 # Submit 419 tasks with with only 25 of them running at any time
#contains the list of 419 commands I want to run
cmd_file=s1List_170519.txt
cmd_line=$(cat $cmd_file | awk -v var=${SLURM_ARRAY_TASK_ID} 'NR==var {print $1}') # Get first argument
$cmd_line #may need to be piped to bash
来源:https://stackoverflow.com/questions/42812425/limit-the-number-of-running-jobs-in-slurm