Make use of all CPUs on SLURM

空扰寡人 提交于 2021-01-27 19:52:00

问题


I would like to run a job on the cluster. There are a different number of CPUs on different nodes and I have no idea which nodes will be assigned to me. What are the proper options so that the job can create as many tasks as CPUs on all nodes?

#!/bin/bash -l

#SBATCH -p normal
#SBATCH -N 4
#SBATCH -t 96:00:00

srun -n 128 ./run

回答1:


One dirty hack to achieve the objective is using the environment variables provided by the SLURM. For a sample sbatch file:

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH --time=10:00
#SBATCH --nodes=2
echo $SLURM_CPUS_ON_NODE
echo $SLURM_JOB_NUM_NODES   
num_core=$SLURM_CPUS_ON_NODE
num_node=$SLURM_JOB_NUM_NODES
let proc_num=$num_core*$num_node
echo $proc_num
srun -n $proc_num ./run

Only the number of nodes are requested in the job script. $SLURM_CPUS_ON_NODE will provide the number of cpus per node. You can use it along with other environment variables (eg: $SLURM_JOB_NUM_NODES) to know the number of tasks possible. In the above script dynamic task calculation is done with the assumption that the nodes are homogenous (i.e $SLURM_CPUS_ON_NODE will give only single number ).

For heterogeneous nodes, $SLURM_CPUS_ON_NODE will give multiple values (eg: 2,3 if the nodes allocated has 2 and 3 cpus). In such scenario, $SLURM_JOB_NODELIST can be used to find out the number of cpus corresponding to the allocated nodes and with that you can calculate the required tasks.



来源:https://stackoverflow.com/questions/57466957/make-use-of-all-cpus-on-slurm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!