SLURM: After allocating all GPUs no more cpu job can be submitted

末鹿安然 提交于 2019-12-02 02:48:06

问题


We have just started using slurm for managing our GPUs (currently just 2). We use ubuntu 14.04 and slurm-llnl. ​I have configured gres.conf and srun works. The problem is that if I run two jobs with --gres=gpu:1 then the two GPUs are successfully allocated and the jobs start running; now I expect to be able to run more jobs (in addition to the 2 GPU jobs) without --gres=gpu:1 (i.e. jobs than only use CPU and ram) but it is not possible.

The error message says that it could not allocate required resources (even though there are 24 CPU cores).

This is my gres.conf:

Name=gpu Type=titanx File=/dev/nvidia0
Name=gpu Type=titanx File=/dev/nvidia1
NodeName=ubuntu Name=gpu Type=titanx File=/dev/nvidia[0-1]

I appreciate any help. Thank you.


回答1:


Make sure that SelectType in your configuration is CR_CPU or CR_Core and that the shared option of the partition is not set to exclusive. Otherwise Slurm allocates full nodes to jobs.



来源:https://stackoverflow.com/questions/37093705/slurm-after-allocating-all-gpus-no-more-cpu-job-can-be-submitted

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!