SLURM Reservations
SLURM provides “reservations” of compute nodes. A reservation reserves a node (or nodes) for your exclusive use. So if you need to use a node without any other users on the node, or for an extended period of time, we can reserve the node (or nodes) for your use only.
Getting a Reservation
Reservations for nodes can be requested by sending an email to cshelpdesk@virginia.edu. We can reserve an entire compute node or node group (ex. “cortado01” or “cortado[01-05]”).
In your request for a reservation, please be specific about:
- What servers are needed (ex. “cortado01-05”)
- The start and end time of the reservation (ex. today to Dec 31st)
- Who can use the reservation (a list of user IDs)
Note: the requested node must be free of any running jobs before a reservation is started. So if you are requesting a reservation on a node upon which you are running a job, we will create the reservation but it will not take effect until the job completes on the node. So please cancel ('scancel') your job before requesting a reservation.
Note: nodes may be reserved for a maximum of fourteen days, and extended in one week increments. Reservations that are not being used (no jobs running) will be removed.
Using reservations when submitting jobs
If you have been granted a reservation you will receive a reservation tag, a short string that is required to submit jobs. This string must be included in any srun
or sbatch
submissions via the flag --reservation=...
.
If our tag is “abc1de_4” then you would submit jobs using the flag --reservation=abc1de_4
This can be done at the command line:
[abc1de@portal03 ~]$ sbatch --reservation=abc1de_4 slurm.sh ... [abc1de@portal03 ~]$ srun --reservation=abc1de_4 slurm.sh
Or you can include the flag in the header of your sbatch file:
[abc1de@portal03 ~]$ cat reservation.sh #!/bin/bash #SBATCH --job-name="Reservation Example" # #SBATCH --error="stderr.txt" #SBATCH --output="stdout.txt" # #SBATCH --reservation=abc1de_4 <- Include reservation tag ... [abc1de@portal03 ~]$ sbatch reservation.sh
Listing Reservations
You can see a listing of all active reservations by using the scontrol
command:
[abc1de@portal03 ~]$ scontrol show reservation ReservationName=abc1de_4 StartTime=2019-06-25T00:00:00 EndTime=2019-07-02T16:00:00 Nodes=slurm1 NodeCnt=1 CoreCnt=12 Features=(null) PartitionName=(null) Flags=SPEC_NODES TRES=cpu=24 Users=abc1de Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
Avoiding Default Cluster Limits with Reservations
The CS cluster has a default SLURM QoS named csdefault
which limits the number of concurrent jobs a user may run. Reservations may circumvent this limit by using the QoS csresnolim
in srun
commands or sbatch
scripts using the parameter -q csresnolim
or –qos=csresnolim
.
This QoS requires a reservation be specified to be used, that is, you must also include –reservation=abc1de_4
in your srun
commands or sbatch
scripts.