Site Tools


compute_slurm_reservations

SLURM Reservations

SLURM provides “reservations” of compute nodes. A reservation reserves a node (or nodes) for your exclusive use. So if you need to use a node without any other users on the node, or for an extended period of time, we can reserve the node (or nodes) for your use only.

Getting a Reservation

Reservations for nodes can be requested by sending an email to cshelpdesk@virginia.edu. We can reserve an entire compute node or node group (ex. “cortado01” or “cortado[01-05]”).

In your request for a reservation, please be specific about:

  1. What servers are needed (ex. “cortado01-05”)
  2. The start and end time of the reservation (ex. today to Dec 31st)
  3. Who can use the reservation (a list of user IDs)

Note: the requested node must be free of any running jobs before a reservation is started. So if you are requesting a reservation on a node upon which you are running a job, we will create the reservation but it will not take effect until the job completes on the node. So please cancel ('scancel') your job before requesting a reservation.

Note: nodes may be reserved for a maximum of fourteen days, and extended in one week increments. Reservations that are not being used (no jobs running) will be removed.

Using reservations when submitting jobs

If you have been granted a reservation you will receive a reservation tag, a short string that is required to submit jobs. This string must be included in any srun or sbatch submissions via the flag --reservation=....

If our tag is “abc1de_4” then you would submit jobs using the flag --reservation=abc1de_4

This can be done at the command line:

[abc1de@portal03 ~]$ sbatch --reservation=abc1de_4 slurm.sh
...
[abc1de@portal03 ~]$ srun --reservation=abc1de_4 slurm.sh

Or you can include the flag in the header of your sbatch file:

[abc1de@portal03 ~]$ cat reservation.sh
#!/bin/bash
#SBATCH --job-name="Reservation Example" 
#
#SBATCH --error="stderr.txt"
#SBATCH --output="stdout.txt"
#
#SBATCH --reservation=abc1de_4               <- Include reservation tag
...
[abc1de@portal03 ~]$ sbatch reservation.sh

If the reserved node is in a queue (“partition”) other than the default “main” partition, you must include the partition in your job script or the command line.

[abc1de@portal03 ~]$ sbatch --reservation=abc1de_4 -p gpu slurm.sh
...
[abc1de@portal03 ~]$ srun --reservation=abc1de_4 -p gpu slurm.sh

Listing Reservations

You can see a listing of all active reservations by using the scontrol command:

[abc1de@portal03 ~]$ scontrol show reservation
ReservationName=abc1de_4 StartTime=2019-06-25T00:00:00 EndTime=2019-07-02T16:00:00 
   Nodes=slurm1 NodeCnt=1 CoreCnt=12 Features=(null) PartitionName=(null) Flags=SPEC_NODES
   TRES=cpu=24
   Users=abc1de Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a

Avoiding Default Cluster Limits with Reservations

The CS cluster has a default SLURM QoS named csdefault which limits the number of concurrent jobs a user may run. Reservations may circumvent this limit by using the QoS csresnolim in srun commands or sbatch scripts using the parameter -q csresnolim or --qos=csresnolim.

This QoS requires a reservation be specified to be used, that is, you must also include --reservation=abc1de_4 in your srun commands or sbatch scripts.

Deleting idle reservations

Computing resources are finite, and there is considerable demand for nodes, especially the higher end GPU nodes. Reserving a node means that you are not allowing anyone else to use the node, denying that resource to others. So if the support staff sees that a reserved node has sat idle for a few hours, they will delete the reservation.

compute_slurm_reservations.txt · Last modified: 2024/02/26 17:33 by 127.0.0.1