Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
compute_slurm [2021/02/06 20:05] pgh5a |
compute_slurm [2022/04/04 14:25] (current) |
||
---|---|---|---|
Line 5: | Line 5: | ||
The Computer Science Department uses a "job scheduler" called [[https://en.wikipedia.org/wiki/Slurm_Workload_Manager|SLURM]]. The purpose of a job scheduler is to allocate computational resources (servers) to users who submit "jobs" to a queue. The job scheduler looks at the requirements stated in the job's script and allocates to the job a server (or servers) which matches the requirements specified in the job script. For example, if the job script specifies that a job needs 192GB of memory, the job scheduler will find a server with at least that much memory free. | The Computer Science Department uses a "job scheduler" called [[https://en.wikipedia.org/wiki/Slurm_Workload_Manager|SLURM]]. The purpose of a job scheduler is to allocate computational resources (servers) to users who submit "jobs" to a queue. The job scheduler looks at the requirements stated in the job's script and allocates to the job a server (or servers) which matches the requirements specified in the job script. For example, if the job script specifies that a job needs 192GB of memory, the job scheduler will find a server with at least that much memory free. | ||
+ | The job scheduler supports a direct login option (see below) that allows direct interactive logins to servers controlled by the scheduler, without the need for a job script. | ||
+ | |||
+ | As of 10-Jan-2022, the SLURM job scheduler was updated to the latest revision which enforces memory limits. As a result, jobs that exceed their requested memory size will be terminated by the scheduler. | ||
+ | |||
+ | As of 02-Apr-2022, the time limit enforcement policy within SLURM has changed. All jobs submitted with time limits are extended 60 minutes past the user-submitted time limit. E.g. If the user submits a job with a time limit of 10 minutes using parameter "-t 10", SLURM will kill the job after 70 minutes. | ||
+ | |||
=== Using SLURM === | === Using SLURM === | ||
**[[https://slurm.schedmd.com/pdfs/summary.pdf| | **[[https://slurm.schedmd.com/pdfs/summary.pdf| | ||
Slurm Commands Cheat Sheet]]** | Slurm Commands Cheat Sheet]]** | ||
+ | |||
+ | The SLURM commands below are ONLY available on the portal cluster of servers. They are not installed on the gpusrv* or the SLURM controlled nodes themselves. | ||
=== Information Gathering === | === Information Gathering === | ||
Line 49: | Line 57: | ||
=== Jobs === | === Jobs === | ||
- | To use SLURM resources, you must submit your jobs (program/script/etc.) to the SLURM controller. The controller will then send your job to compute nodes for execution, after which time your results will be returned. | + | To use SLURM resources, you must submit your jobs (program/script/etc.) to the SLURM controller. The controller will then send your job to compute nodes for execution, after which time your results will be returned. There is also an //direct login option// (see below) that doesn't require a job script. |
Users can submit SLURM jobs from ''%%portal.cs.virginia.edu%%''. You can submit jobs using the commands [[https://slurm.schedmd.com/srun.html|srun]] or [[https://slurm.schedmd.com/sbatch.html|sbatch]]. Let's look at a very simple example script and ''%%sbatch%%'' command. | Users can submit SLURM jobs from ''%%portal.cs.virginia.edu%%''. You can submit jobs using the commands [[https://slurm.schedmd.com/srun.html|srun]] or [[https://slurm.schedmd.com/sbatch.html|sbatch]]. Let's look at a very simple example script and ''%%sbatch%%'' command. | ||
Line 168: | Line 176: | ||
Reservations for specific resources or nodes can be made by submitting a request to <cshelpdesk@virginia.edu>. For more information about using reservations, see the main article on [[compute_slurm_reservations|SLURM Reservations]] | Reservations for specific resources or nodes can be made by submitting a request to <cshelpdesk@virginia.edu>. For more information about using reservations, see the main article on [[compute_slurm_reservations|SLURM Reservations]] | ||
+ | |||
+ | === Job Accounting === | ||
+ | The SLURM scheduler implements the Accounting features of slurm. So users can execute the ''%%sacct%%'' command to find job accounting information, like job ids, job names, partition run upon, allocated CPUs, job state, and exit codes. There are numerous other options supported. Type ''%%man sacct%%'' on portal to see all the options. | ||
=== Note on Modules in Slurm === | === Note on Modules in Slurm === |