Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
compute_slurm [2021/03/05 17:06]
pgh5a
compute_slurm [2022/04/04 14:25] (current)
Line 5: Line 5:
 The Computer Science Department uses a "job scheduler"​ called [[https://​en.wikipedia.org/​wiki/​Slurm_Workload_Manager|SLURM]]. ​ The purpose of a job scheduler is to allocate computational resources (servers) to users who submit "​jobs"​ to a queue. The job scheduler looks at the requirements stated in the job's script and allocates to the job a server (or servers) which matches the requirements specified in the job script. For example, if the job script specifies that a job needs 192GB of memory, the job scheduler will find a server with at least that much memory free. The Computer Science Department uses a "job scheduler"​ called [[https://​en.wikipedia.org/​wiki/​Slurm_Workload_Manager|SLURM]]. ​ The purpose of a job scheduler is to allocate computational resources (servers) to users who submit "​jobs"​ to a queue. The job scheduler looks at the requirements stated in the job's script and allocates to the job a server (or servers) which matches the requirements specified in the job script. For example, if the job script specifies that a job needs 192GB of memory, the job scheduler will find a server with at least that much memory free.
  
 +The job scheduler supports a direct login option (see below) that allows direct interactive logins to servers controlled by the scheduler, without the need for a job script.
 +
 +As of 10-Jan-2022,​ the SLURM job scheduler was updated to the latest revision which enforces memory limits. As a result, jobs that exceed their requested memory size will be terminated by the scheduler.
 +
 +As of 02-Apr-2022,​ the time limit enforcement policy within SLURM has changed. All jobs submitted with time limits are extended 60 minutes past the user-submitted time limit. E.g. If the user submits a job with a time limit of 10 minutes using parameter "-t 10", SLURM will kill the job after 70 minutes.
 + 
 === Using SLURM === === Using SLURM ===
 **[[https://​slurm.schedmd.com/​pdfs/​summary.pdf| **[[https://​slurm.schedmd.com/​pdfs/​summary.pdf|
Line 51: Line 57:
 === Jobs === === Jobs ===
  
-To use SLURM resources, you must submit your jobs (program/​script/​etc.) to the SLURM controller. ​ The controller will then send your job to compute nodes for execution, after which time your results will be returned.+To use SLURM resources, you must submit your jobs (program/​script/​etc.) to the SLURM controller. ​ The controller will then send your job to compute nodes for execution, after which time your results will be returned. There is also an //direct login option// (see below) that doesn'​t require a job script.
  
 Users can submit SLURM jobs from ''​%%portal.cs.virginia.edu%%''​. ​ You can submit jobs using the commands [[https://​slurm.schedmd.com/​srun.html|srun]] or [[https://​slurm.schedmd.com/​sbatch.html|sbatch]]. ​ Let's look at a very simple example script and ''​%%sbatch%%''​ command. Users can submit SLURM jobs from ''​%%portal.cs.virginia.edu%%''​. ​ You can submit jobs using the commands [[https://​slurm.schedmd.com/​srun.html|srun]] or [[https://​slurm.schedmd.com/​sbatch.html|sbatch]]. ​ Let's look at a very simple example script and ''​%%sbatch%%''​ command.
Line 170: Line 176:
  
 Reservations for specific resources or nodes can be made by submitting a request to <​cshelpdesk@virginia.edu>​. ​ For more information about using reservations,​ see the main article on [[compute_slurm_reservations|SLURM Reservations]] Reservations for specific resources or nodes can be made by submitting a request to <​cshelpdesk@virginia.edu>​. ​ For more information about using reservations,​ see the main article on [[compute_slurm_reservations|SLURM Reservations]]
 +
 +=== Job Accounting ===
 +The SLURM scheduler implements the Accounting features of slurm. So users can execute the ''​%%sacct%%''​ command to find job accounting information,​ like job ids, job names, partition run upon, allocated CPUs, job state, and exit codes. There are numerous other options supported. Type ''​%%man sacct%%''​ on portal to see all the options.  ​
  
 === Note on Modules in Slurm === === Note on Modules in Slurm ===
  • compute_slurm.1614963972.txt.gz
  • Last modified: 2021/03/05 17:06
  • by pgh5a