SLURM

CS uses the SLURM resource scheduler to control cluster resources. The slurm system is controlled by a head node called slurm.cs.virginia.edu (a CNAME for coresrv05). This node receives job requests from the portal nodes and sends the jobs to the slurm compute nodes to be executed.

Gerneral Slurm Info

Running sinfo -a on slurm head (power1-power6) will show you the status of slurm nodes

root@slurm:~# sinfo -a
PARTITION     AVAIL  TIMELIMIT  NODES  STATE NODELIST
main*            up   infinite      3  down* artemis[6-7],hermes4
main*            up   infinite      8   idle artemis[1-5],hermes[1-3]
qdata            up   infinite      7   idle qdata[1-3,5-8]
qdata            up   infinite      1   down qdata4
qdata-preempt    up   infinite      7   idle qdata[1-3,5-8]
qdata-preempt    up   infinite      1   down qdata4
falcon           up   infinite      1    mix falcon1
falcon           up   infinite      9   idle falcon[2-10]

If everything is up and running on a node, things should look like this:

 root@artemis6:~# service munge status
 munged (pid 1568) is running

 root@artemis6:~# service slurm-llnl status
 slurmd (pid 32230) is running...

However if either of the services are down bring them back up with

service munge start

or

service slurm-llnl start

Potential Issues

munged

There is a bug that we see from time to time on the Ubuntu 14.04 slurm nodes. Sometimes munged will fail to start on boot because there is no directory in /var/run for munged to put a socket file.

root@artemis6:~# service munge start
* Starting MUNGE munged                                                 [fail] 
munged: Warning: Logfile is insecure: group-writable permissions set on "/var/log"
munged: Error: Failed to create "/var/run/munge/munge.socket.2.lock": Permission denied

Sometimes the directory exists, but it is owned by root. Sometimes the directory doesn't exist at all.. We need to create the directory and chown with the following settings. Then you can start the service:

root@artemis6:~# mkdir /var/run/munge
root@artemis6:~# ls -ld /var/run/munge/
drwxr-xr-x 2 root root 40 Apr 25 10:39 /var/run/munge/
root@artemis6:~# chown munge /var/run/munge/
root@artemis6:~# service munge start

Node is stuck DOWN|DRAIN|etc

Sometimes you will find that a node has both munge and slurm-llnl running properly, but sinfo still shows the node as either down or drain and you can't send jobs to the host. You can use

scontrol

to manually set the node to state RESUME

root@slurm:~# scontrol update NodeName=slurm[1-5] State=RESUME

If the node is in a drain state, sometimes we have to down the node first before we can resume

root@slurm:~# scontrol update NodeName=qdata2 State=DOWN Reason=temp
root@slurm:~# scontrol update NodeName=qdata2 State=RESUME

Sometimes a node is in a “drain” state because the facts about the node differ from what is reported by the node. For example, if the definition in the slurm.conf file states that a node has 128G of memory, but the slurm daemon finds that the node only has 96G, the daemon will mark the node as “drain”. The only way to clear this is to change the node definition in the slurm.conf file across all nodes AND the controller node, then do an 'scontrol reconfigure' on the controller and the node to re-read the slurm.conf file.

Reservations

Reservations allow us to reserve access to a node or set of nodes for a user or set of users. To control reservations, we must log onto the slurm controller slurm.cs.virginia.edu.

Create a Reservation

root@slurm:~# scontrol create reservation starttime=2018-04-09T01:00:00 endtime=2018-07-09T16:00:00 user=jc6ub nodes=ai06
Reservation created: jc6ub_6

Common Options:

starttime	Date stamp of the beginning of reservation.
endtime	Date stamp of the beginning of reservation.
duration	Duration in minutes.
user	User(s) who can use the reservation. Multiple users may be listed, comma separated.
nodes	Node(s) that the reservation covers. Multiple nodes may be listed, comma separated.

Make note of the output of this command. The name of the reservation (in this case jc6ub_6) must be used to submit jobs to the reservation. The user will have to add --reservation=jc6ub_6 to any srun or sbatch commands:

srun -w ai06 --reservation=jc6ub_6 slurm-script.sh

If a node has running jobs:

If the node you want to reserve is already allocated and running jobs, you must first drain the node and let running jobs finish. You can do this using scontrol:

scontrol update NodeName=ai06 State=DRAIN Reason="Setting up reservation"

Now wait until all jobs finish and the node state is IDLE. Once the node is idle, issue the reservation and then finally resume the node:

scontrol update NodeName=ai06 State=RESUME

View Reservations

root@slurm:~# scontrol show res
ReservationName=jc6ub_5 StartTime=2018-04-09T16:00:00 EndTime=2018-07-09T16:00:00 Duration=91-00:00:00
   Nodes=ai06 NodeCnt=1 CoreCnt=16 Features=(null) PartitionName=(null) Flags=SPEC_NODES
   TRES=cpu=32
   Users=jc6ub Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a

Deleting a Reservation

root@slurm:~# scontrol delete ReservationName=jc6ub_5

Updating a Reservation

We can use scontrol to update different reservation parameters. For instance, if we want to add/remove users who can use a reservation we can update the users parameter. When we run the command, we want to include the entire list of users, not just the users we want to add.

If our reservation already allows user1 and user2, but we want to add user3 then we would run this command:

root@slurm:~# scontrol update ReservationName=example_res users=user1,user2,user3

To change the end date for a reservation we would run

root@slurm:~# scontrol update ReservationName=example_res endtime=2018-11-09T16:00:00

If a Node Has Jobs Running

You will not be able to create a reservation on a node that currently has running jobs. We must set the node to a drain state and wait for the jobs to complete.

root@slurm:~# scontrol create reservation starttime=2018-08-20T01:00:00 endtime=2018-11-13T16:00:00 user=jc6ub nodes=ai06
Error creating the reservation: Requested nodes are busy
root@slurm:~# scontrol update NodeName=ai06 State=DRAIN
You must specify a reason when DOWNING or DRAINING a node. Request denied
root@slurm:~# scontrol update NodeName=ai06 State=DRAIN Reason="Draining node to set reservation"

After all jobs have finished running you should be able to create the reservation.

Table of Contents