compute_slurm_reservations
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
compute_slurm_reservations [2019/06/17 13:59] – ktm5j | compute_slurm_reservations [2024/02/26 17:33] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ==== SLURM Reservations ==== | ||
+ | |||
+ | SLURM provides " | ||
+ | |||
+ | === Getting a Reservation === | ||
+ | |||
+ | Reservations for nodes can be requested by sending an email to < | ||
+ | |||
+ | In your request for a reservation, | ||
+ | |||
+ | - What servers are needed (ex. " | ||
+ | - The start and end time of the reservation (ex. today to Dec 31st) | ||
+ | - Who can use the reservation (a list of user IDs) | ||
+ | |||
+ | //Note: the requested node must be free of any running jobs before a reservation is started. So if you are requesting a reservation on a node upon which you are running a job, we will create the reservation but it will not take effect until the job completes on the node. So please cancel (' | ||
+ | |||
+ | //Note: nodes may be reserved for a maximum of fourteen days, and extended in one week increments. Reservations that are not being used (no jobs running) will be removed.// | ||
+ | |||
+ | === Using reservations when submitting jobs === | ||
+ | |||
+ | If you have been granted a reservation you will receive a reservation tag, a short string that is required to submit jobs. This string must be included in any '' | ||
+ | |||
+ | If our tag is " | ||
+ | |||
+ | This can be done at the command line: | ||
+ | |||
+ | < | ||
+ | [abc1de@portal03 ~]$ sbatch --reservation=abc1de_4 slurm.sh | ||
+ | ... | ||
+ | [abc1de@portal03 ~]$ srun --reservation=abc1de_4 slurm.sh | ||
+ | </ | ||
+ | |||
+ | Or you can include the flag in the header of your sbatch file: | ||
+ | |||
+ | < | ||
+ | [abc1de@portal03 ~]$ cat reservation.sh | ||
+ | #!/bin/bash | ||
+ | #SBATCH --job-name=" | ||
+ | # | ||
+ | #SBATCH --error=" | ||
+ | #SBATCH --output=" | ||
+ | # | ||
+ | #SBATCH --reservation=abc1de_4 | ||
+ | ... | ||
+ | [abc1de@portal03 ~]$ sbatch reservation.sh | ||
+ | </ | ||
+ | |||
+ | If the reserved node is in a queue (" | ||
+ | < | ||
+ | [abc1de@portal03 ~]$ sbatch --reservation=abc1de_4 -p gpu slurm.sh | ||
+ | ... | ||
+ | [abc1de@portal03 ~]$ srun --reservation=abc1de_4 -p gpu slurm.sh | ||
+ | </ | ||
+ | |||
+ | === Listing Reservations === | ||
+ | |||
+ | You can see a listing of all active reservations by using the '' | ||
+ | |||
+ | < | ||
+ | [abc1de@portal03 ~]$ scontrol show reservation | ||
+ | ReservationName=abc1de_4 StartTime=2019-06-25T00: | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | </ | ||
+ | |||
+ | === Avoiding Default Cluster Limits with Reservations === | ||
+ | The CS cluster has a default SLURM QoS named '' | ||
+ | |||
+ | This QoS requires a reservation be specified to be used, that is, you must also include '' | ||
+ | === Deleting idle reservations === | ||
+ | Computing resources are finite, and there is considerable demand for nodes, especially the higher end GPU nodes. Reserving a node means that you are not allowing anyone else to use the node, denying that resource to others. So if the support staff sees that a reserved node has sat idle for a few hours, they will delete the reservation. | ||