Limit the number of running jobs in SLURM

后端 未结 4 1153
粉色の甜心
粉色の甜心 2020-12-16 17:13

I am queuing multiple jobs in SLURM. Can I limit the number of parallel running jobs in slurm?

Thanks in advance!

相关标签:
4条回答
  • 2020-12-16 17:34

    If you are not the administrator, your can hold some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID>, and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD. Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25 to have 100 jobs in the array but only 25 of them running.

    0 讨论(0)
  • 2020-12-16 17:40

    According to SLURM documentation, --array=0-15%4 (- sign and not :) will limit the number of simultaneously running tasks from this job array to 4

    I wrote test.sbatch:

    #!/bin/bash
    # test.sbatch
    #
    #SBATCH -J a
    #SBATCH -p campus
    #SBATCH -c 1
    #SBATCH -o %A_%a.output
    
    mkdir test${SLURM_ARRAY_TASK_ID}
    
    # sleep for up to 10 minutes to see them running in squeue and 
    # different times to check that the number of parallel jobs remain constant
    RANGE=600; number=$RANDOM; let "number %= $RANGE"; echo "$number"
    
    sleep $number
    

    and run it with sbatch --array=1-15%4 test.sbatch

    Jobs run as expected (always 4 in parallel) and just create directories and kept running for $number seconds.

    Appreciate comments and suggestions.

    0 讨论(0)
  • 2020-12-16 17:58

    According to the SLURM Resource Limits documentation, you can limit the total number of jobs that you can run for an association/qos with the MaxJobs parameter. As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.

    You should be able to do something similar to:

    sacctmgr modify user <userid> account=<account_name> set MaxJobs=10
    

    I found this presentation to be very helpful in case you have more questions.

    0 讨论(0)
  • 2020-12-16 18:00

    If your jobs are relatively similar you can use the slurm array functions. I had been trying to figure this out for a while and found this solution at https://docs.id.unibe.ch/ubelix/job-management-with-slurm/array-jobs-with-slurm

    #!/bin/bash -x
    #SBATCH --mail-type=NONE
    #SBATCH --array=1-419%25  # Submit 419 tasks with with only 25 of them running at any time
    
    #contains the list of 419 commands I want to run
    cmd_file=s1List_170519.txt
    
    cmd_line=$(cat $cmd_file | awk -v var=${SLURM_ARRAY_TASK_ID} 'NR==var {print $1}')    # Get first argument
    
    $cmd_line  #may need to be piped to bash
    
    0 讨论(0)
提交回复
热议问题