问题
Is there a way to oversubscribe GPUs on Slurm, i.e. run multiple jobs/job steps that share one GPU? We've only found ways to oversubscribe CPUs and memory, but not GPUs.
We want to run multiple job steps on the same GPU in parallel and optionally specify the GPU memory used for each step.
回答1:
The easiest way of doing that is to have the GPU defined as a feature
rather than as a gres
so Slurm will not manage the GPUs, just make sure that job that need one land on nodes that offer one.
来源:https://stackoverflow.com/questions/55186407/slurm-oversubscribe-gpus