Emulating SLURM on Ubuntu 16.04

99封情书 提交于 2019-12-02 12:35:42

So ... we have an existing cluster here but it runs an older Ubuntu version which does not mesh well with my workstation running 17.04.

So on my workstation, I just made sure I slurmctld (backend) and slurmd installed, and then set up a trivial slurm.conf with

ControlMachine=mybox
# ...
NodeName=DEFAULT CPUs=4 RealMemory=4000 TmpDisk=50000 State=UNKNOWN
NodeName=mybox CPUs=4 RealMemory=16000

after which I restarted slurmcltd and then slurmd. Now all is fine:

root@mybox:/etc/slurm-llnl$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
demo         up   infinite      1   idle mybox
root@mybox:/etc/slurm-llnl$ 

This is a degenerate setup, our real one has a mix of dev and prod machine and appropriate partitions. But this should answer your "can backend really be client" question. Also, my machine is not really called mybox but is not really pertinent for the question in either case.

Using Ubuntu 17.04, all stock, with munge to communicate (which is the default anyway).

Edit: To wit:

me@mybox:~$ COLUMNS=90 dpkg -l '*slurm*' | grep ^ii
ii  slurm-client     16.05.9-1ubun amd64         SLURM client side commands
ii  slurm-wlm-basic- 16.05.9-1ubun amd64         SLURM basic plugins
ii  slurmctld        16.05.9-1ubun amd64         SLURM central management daemon
ii  slurmd           16.05.9-1ubun amd64         SLURM compute node daemon
me@mybox:~$

I would still prefer to run SLURM natively, but I caved and spun up a Debian 9.2 VM. See here for my efforts to troubleshoot a native installation. The directions here worked smoothly, but I needed to make the following changes to slurm.conf. Below, Debian64 is the hostname, and wlandau is my user name.

  • ControlMachine=Debian64
  • SlurmUser=wlandau
  • NodeName=Debian64

Here is the complete slurm.conf. An analogous slurm.conf did not work on my native Ubuntu 16.04.

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=Debian64
#ControlAddr=
#BackupController=
#BackupAddr=
# 
AuthType=auth/munge
#CheckpointType=checkpoint/none 
CryptoType=crypto/munge
#DisableRootJobs=NO 
#EnforcePartLimits=NO 
#Epilog=
#EpilogSlurmctld= 
#FirstJobId=1 
#MaxJobId=999999 
#GresTypes= 
#GroupUpdateForce=0 
#GroupUpdateTime=600 
#JobCheckpointDir=/var/lib/slurm-llnl/checkpoint 
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0 
#JobRequeue=1 
#JobSubmitPlugins=1 
#KillOnBadExit=0 
#LaunchType=launch/slurm 
#Licenses=foo*4,bar 
#MailProg=/usr/bin/mail 
#MaxJobCount=5000 
#MaxStepCount=40000 
#MaxTasksPerNode=128 
MpiDefault=none
#MpiParams=ports=#-# 
#PluginDir= 
#PlugStackConfig= 
#PrivateData=jobs 
ProctrackType=proctrack/pgid
#Prolog=
#PrologFlags= 
#PrologSlurmctld= 
#PropagatePrioProcess=0 
#PropagateResourceLimits= 
#PropagateResourceLimitsExcept= 
#RebootProgram= 
ReturnToService=1
#SallocDefaultCommand= 
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=wlandau
#SlurmdUser=root 
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TopologyPlugin=topology/tree 
#TmpFS=/tmp 
#TrackWCKey=no 
#TreeWidth= 
#UnkillableStepProgram= 
#UsePAM=0 
# 
# 
# TIMERS 
#BatchStartTimeout=10 
#CompleteWait=0 
#EpilogMsgTime=2000 
#GetEnvTimeout=2 
#HealthCheckInterval=0 
#HealthCheckProgram= 
InactiveLimit=0
KillWait=30
#MessageTimeout=10 
#ResvOverRun=0 
MinJobAge=300
#OverTimeLimit=0 
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60 
#VSizeFactor=0 
Waittime=0
# 
# 
# SCHEDULING 
#DefMemPerCPU=0 
FastSchedule=1
#MaxMemPerCPU=0 
#SchedulerRootFilter=1 
#SchedulerTimeSlice=30 
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear
#SelectTypeParameters=
# 
# 
# JOB PRIORITY 
#PriorityFlags= 
#PriorityType=priority/basic 
#PriorityDecayHalfLife= 
#PriorityCalcPeriod= 
#PriorityFavorSmall= 
#PriorityMaxAge= 
#PriorityUsageResetPeriod= 
#PriorityWeightAge= 
#PriorityWeightFairshare= 
#PriorityWeightJobSize= 
#PriorityWeightPartition= 
#PriorityWeightQOS= 
# 
# 
# LOGGING AND ACCOUNTING 
#AccountingStorageEnforce=0 
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=cluster
#DebugFlags= 
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobContainerType=job_container/none 
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#SlurmSchedLogFile= 
#SlurmSchedLogLevel= 
# 
# 
# POWER SAVE SUPPORT FOR IDLE NODES (optional) 
#SuspendProgram= 
#ResumeProgram= 
#SuspendTimeout= 
#ResumeTimeout= 
#ResumeRate= 
#SuspendExcNodes= 
#SuspendExcParts= 
#SuspendRate= 
#SuspendTime= 
# 
# 
# COMPUTE NODES 
NodeName=Debian64 CPUs=1 RealMemory=744 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN 
PartitionName=debug Nodes=Debian64 Default=YES MaxTime=INFINITE State=UP
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!