Table of Contents
SLURM
CS uses the SLURM resource scheduler to control cluster resources. The slurm system is controlled by a head node called slurm.cs.virginia.edu
(a CNAME for coresrv05). This node receives job requests from the portal nodes and sends the jobs to the slurm compute nodes to be executed.
Gerneral Slurm Info
Running sinfo -a on slurm head (power1
-power6
) will show you the status of slurm nodes
root@slurm:~# sinfo -a PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up infinite 3 down* artemis[6-7],hermes4 main* up infinite 8 idle artemis[1-5],hermes[1-3] qdata up infinite 7 idle qdata[1-3,5-8] qdata up infinite 1 down qdata4 qdata-preempt up infinite 7 idle qdata[1-3,5-8] qdata-preempt up infinite 1 down qdata4 falcon up infinite 1 mix falcon1 falcon up infinite 9 idle falcon[2-10]
If everything is up and running on a node, things should look like this:
root@artemis6:~# service munge status munged (pid 1568) is running
root@artemis6:~# service slurm-llnl status slurmd (pid 32230) is running...
However if either of the services are down bring them back up with
service munge start
or
service slurm-llnl start
Potential Issues
munged
There is a bug that we see from time to time on the Ubuntu 14.04 slurm nodes. Sometimes munged
will fail to start on boot because there is no directory in /var/run
for munged
to put a socket file.
root@artemis6:~# service munge start * Starting MUNGE munged [fail] munged: Warning: Logfile is insecure: group-writable permissions set on "/var/log" munged: Error: Failed to create "/var/run/munge/munge.socket.2.lock": Permission denied
Sometimes the directory exists, but it is owned by root. Sometimes the directory doesn't exist at all.. We need to create the directory and chown with the following settings. Then you can start the service:
root@artemis6:~# mkdir /var/run/munge root@artemis6:~# ls -ld /var/run/munge/ drwxr-xr-x 2 root root 40 Apr 25 10:39 /var/run/munge/ root@artemis6:~# chown munge /var/run/munge/ root@artemis6:~# service munge start
Node is stuck DOWN|DRAIN|etc
Sometimes you will find that a node has both munge and slurm-llnl running properly, but sinfo still shows the node as either down or drain and you can't send jobs to the host. You can use
scontrol
to manually set the node to state RESUME
root@slurm:~# scontrol update NodeName=slurm[1-5] State=RESUME
If the node is in a drain
state, sometimes we have to down
the node first before we can resume
root@slurm:~# scontrol update NodeName=qdata2 State=DOWN Reason=temp root@slurm:~# scontrol update NodeName=qdata2 State=RESUME
Sometimes a node is in a “drain” state because the facts about the node differ from what is reported by the node. For example, if the definition in the slurm.conf file states that a node has 128G of memory, but the slurm daemon finds that the node only has 96G, the daemon will mark the node as “drain”. The only way to clear this is to change the node definition in the slurm.conf file across all nodes AND the controller node, then do an 'scontrol reconfigure' on the controller and the node to re-read the slurm.conf file.
Reservations
Reservations allow us to reserve access to a node or set of nodes for a user or set of users. To control reservations, we must log onto the slurm controller slurm.cs.virginia.edu
.
Create a Reservation
root@slurm:~# scontrol create reservation starttime=2018-04-09T01:00:00 endtime=2018-07-09T16:00:00 user=jc6ub nodes=ai06 Reservation created: jc6ub_6
Common Options:
starttime | Date stamp of the beginning of reservation. |
endtime | Date stamp of the beginning of reservation. |
duration | Duration in minutes. |
user | User(s) who can use the reservation. Multiple users may be listed, comma separated. |
nodes | Node(s) that the reservation covers. Multiple nodes may be listed, comma separated. |
Make note of the output of this command. The name of the reservation (in this case jc6ub_6) must be used to submit jobs to the reservation. The user will have to add --reservation=jc6ub_6
to any srun
or sbatch
commands:
srun -w ai06 --reservation=jc6ub_6 slurm-script.sh
If a node has running jobs:
If the node you want to reserve is already allocated and running jobs, you must first drain the node and let running jobs finish. You can do this using scontrol
:
scontrol update NodeName=ai06 State=DRAIN Reason="Setting up reservation"
Now wait until all jobs finish and the node state is IDLE
. Once the node is idle, issue the reservation and then finally resume the node:
scontrol update NodeName=ai06 State=RESUME
View Reservations
root@slurm:~# scontrol show res ReservationName=jc6ub_5 StartTime=2018-04-09T16:00:00 EndTime=2018-07-09T16:00:00 Duration=91-00:00:00 Nodes=ai06 NodeCnt=1 CoreCnt=16 Features=(null) PartitionName=(null) Flags=SPEC_NODES TRES=cpu=32 Users=jc6ub Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a
Deleting a Reservation
root@slurm:~# scontrol delete ReservationName=jc6ub_5
Updating a Reservation
We can use scontrol
to update different reservation parameters. For instance, if we want to add/remove users who can use a reservation we can update the users parameter. When we run the command, we want to include the entire list of users, not just the users we want to add.
If our reservation already allows user1 and user2, but we want to add user3 then we would run this command:
root@slurm:~# scontrol update ReservationName=example_res users=user1,user2,user3
To change the end date for a reservation we would run
root@slurm:~# scontrol update ReservationName=example_res endtime=2018-11-09T16:00:00
If a Node Has Jobs Running
You will not be able to create a reservation on a node that currently has running jobs. We must set the node to a drain
state and wait for the jobs to complete.
root@slurm:~# scontrol create reservation starttime=2018-08-20T01:00:00 endtime=2018-11-13T16:00:00 user=jc6ub nodes=ai06 Error creating the reservation: Requested nodes are busy root@slurm:~# scontrol update NodeName=ai06 State=DRAIN You must specify a reason when DOWNING or DRAINING a node. Request denied root@slurm:~# scontrol update NodeName=ai06 State=DRAIN Reason="Draining node to set reservation"
After all jobs have finished running you should be able to create the reservation.