Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
compute_slurm [2019/04/03 21:24]
ktm5j [Partitions]
compute_slurm [2021/01/20 14:15] (current)
pgh5a
Line 1: Line 1:
-====== SLURM ======+==== Scheduling a Job using the SLURM job scheduler ​====
  
-The Computer Science Department uses a resource management system called [[https://en.wikipedia.org/​wiki/​Slurm_Workload_Manager|SLURM]].  The purpose of a workload scheduler such as SLURM is to allocate computational resources to our users in such a way that everyone gets their fair share of execution time.+{{ ::​introtoslurm.pdf Intro to SLURM slides }}
  
-===== Note on Modules ​in Slurm =====+The Computer Science Department uses a "job scheduler"​ called [[https://​en.wikipedia.org/​wiki/​Slurm_Workload_Manager|SLURM]]. ​ The purpose of a job scheduler is to allocate computational resources (servers) to users who submit "​jobs"​ to a queue. The job scheduler looks at the requirements stated ​in the job's script and allocates to the job a server (or servers) which matches the requirements specified in the job script. For example, if the job script specifies that a job needs 192GB of memory, the job scheduler will find a server with at least that much memory free.
  
-Due to the way sbatch spawns a bash session (non-login session), some init files are not loaded from ''​%%/etc/profile.d%%''​ This prevents the initialization of the [[linux_environment_modules|Environment Modules]] system and will prevent you from loading software modules.+=== Using SLURM === 
 +**[[https://slurm.schedmd.com/​pdfs/​summary.pdf| 
 +Slurm Commands Cheat Sheet]]**
  
-To fix this, simply include the following line in your sbatch scripts:+=== Information Gathering ===
  
-<code bash> +To view information about compute nodes in the SLURM system, use the command ''​%%sinfo%%''​.
-source /​etc/​profile.d/​modules.sh +
-</​code>​ +
- +
-===== Using SLURM ===== +
- +
-==== Information Gathering ==== +
- +
-SLURM provides a set of tools to use for interacting with the scheduling system.  ​To view information about compute nodes in the SLURM system, ​we can use the command ''​%%sinfo%%''​.+
  
 <​code>​ <​code>​
-ktm5j@power1 ​~ $ sinfo +[abc1de@portal04 ​~]$ sinfo 
-PARTITION ​    ​AVAIL  TIMELIMIT ​ NODES  STATE NODELIST +PARTITION AVAIL  TIMELIMIT ​ NODES  STATE NODELIST 
-main*            up   ​infinite ​    37   idle hermes[1-4],​artemis[1-7],​slurm[1-5],nibbler[1-4],trillian[1-3],granger[1-6],granger[7-8],ai0[1-6+main*        up   ​infinite ​     ​3 ​ drain falcon[3-5] 
-qdata            ​up   ​infinite ​     ​8   idle qdata[1-8+main*        up   ​infinite ​    ​27 ​  idle cortado[01-10],falcon[1-2,6-10],lynx[08-12],slurm[1-5
-qdata-preempt ​   ​up   ​infinite ​     ​8   idle qdata[1-8] +gpu          ​up   ​infinite ​     ​4    mix ai[02-03,05],lynx07 
-falcon ​          up   ​infinite ​    10   idle falcon[1-10] +gpu          ​up   ​infinite ​     1  alloc ai04 
-intel            up   ​infinite ​    ​24 ​  idle artemis7,slurm[1-5],granger[1-6],granger[7-8],​nibbler[1-4],​ai0[1-6] +gpu          ​up   ​infinite ​    12   idle ai[01,06],lynx[01-06],ristretto[01-04]
-amd              up   ​infinite ​    ​13 ​  idle hermes[1-4],​artemis[1-6],​trillian[1-3]+
 </​code>​ </​code>​
  
-With ''​%%sinfo%%''​ we can see a listing of what SLURM calls partitions and a list of nodes associated with these partitions. ​ A partition is a grouping of nodes, for example our //main// partition is a group of all SLURM nodes that are not reserved ​and can be used by anyone. ​ Notice that hosts can be listed in several different partitions.. ​ For example, ''​%%slurm[1-5]%%''​ can be found in both the //main// and //intel// partitions. ​ //Intel// is a partition ​of systems with Intel processors, likewise the hosts in //amd// have AMD processors.+With ''​%%sinfo%%''​ we can see a listing of the job queues or "partitions" ​and a list of nodes associated with these partitions. ​ A partition is a grouping of nodes, for example our //main// partition is a group of all general purpose ​nodesand the //gpu// partition ​is a group of nodes that each contain GPUs.  Sometimes ​hosts can be listed ​in two or more partitions.
  
 To view jobs running on the queue, we can use the command ''​%%squeue%%''​. ​ Say we have submitted one job to the main partition, running ''​%%squeue%%''​ will look like this: To view jobs running on the queue, we can use the command ''​%%squeue%%''​. ​ Say we have submitted one job to the main partition, running ''​%%squeue%%''​ will look like this:
  
 <​code>​ <​code>​
-ktm5j@power1 ​~ $ squeue +abc1de@portal01 ​~ $ squeue 
-             JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME  NODES NODELIST(REASON) +             JOBID PARTITION ​    ​NAME ​    ​USER ​   ST     ​TIME  NODES NODELIST(REASON) 
-            467039 ​     main    ​sleep    ktm5j  ​R ​      ​0:06      1 artemis1+            467039 ​     main    ​my_job ​   abc1de ​ ​R ​     0:06      1 artemis1
 </​code>​ </​code>​
  
Line 43: Line 36:
  
 <​code>​ <​code>​
-ktm5j@power1 ​~ $ sinfo+abc1de@portal01 ​~ $ sinfo
 PARTITION ​    ​AVAIL ​ TIMELIMIT ​ NODES  STATE NODELIST PARTITION ​    ​AVAIL ​ TIMELIMIT ​ NODES  STATE NODELIST
 main*            up   ​infinite ​    ​37 ​  idle hermes[1-4],​artemis[2-7],​slurm[1-5],​nibbler[1-4],​trillian[1-3],​granger[1-6],​granger[7-8],​ai0[1-6] main*            up   ​infinite ​    ​37 ​  idle hermes[1-4],​artemis[2-7],​slurm[1-5],​nibbler[1-4],​trillian[1-3],​granger[1-6],​granger[7-8],​ai0[1-6]
Line 54: Line 47:
 </​code>​ </​code>​
  
-==== Jobs ====+=== Jobs ===
  
 To use SLURM resources, you must submit your jobs (program/​script/​etc.) to the SLURM controller. ​ The controller will then send your job to compute nodes for execution, after which time your results will be returned. To use SLURM resources, you must submit your jobs (program/​script/​etc.) to the SLURM controller. ​ The controller will then send your job to compute nodes for execution, after which time your results will be returned.
  
-Users can submit SLURM jobs from any of the power servers: ​''​%%power1%%''​-''​%%power6%%''​. ​ From a shell, you can submit jobs using the commands [[https://​slurm.schedmd.com/​srun.html|srun]] or [[https://​slurm.schedmd.com/​sbatch.html|sbatch]]. ​ Let's look at a very simple example script and ''​%%sbatch%%''​ command.+Users can submit SLURM jobs from ''​%%portal.cs.virginia.edu%%''​. ​ From a shell, you can submit jobs using the commands [[https://​slurm.schedmd.com/​srun.html|srun]] or [[https://​slurm.schedmd.com/​sbatch.html|sbatch]]. ​ Let's look at a very simple example script and ''​%%sbatch%%''​ command.
  
 Here is our script, all it does is print the hostname of the server running the script. ​ We must add ''​%%SBATCH%%''​ options to our script to handle various SLURM options. Here is our script, all it does is print the hostname of the server running the script. ​ We must add ''​%%SBATCH%%''​ options to our script to handle various SLURM options.
Line 64: Line 57:
 <code bash> <code bash>
 #!/bin/bash #!/bin/bash
- +# --- this job will be run on any available node 
-#SBATCH ​--job-name="​Slurm Simple Test Job" #Name of the job which appears in squeue +and simply output the node's hostname to 
-+my_job.output 
-#SBATCH --mail-type=ALL +#SBATCH --job-name="Slurm Simple Test Job" 
-#SBATCH --mail-user=ktm5j@virginia.edu +#SBATCH --error="​my_job.err"​ 
-# +#SBATCH --output="​my_job.output"​ 
-#SBATCH --error="​my_job.err" ​                   # Where to write std err +echo "​$HOSTNAME"​
-#SBATCH --output="​my_job.output" ​               # Where to write stdout +
-#SBATCH --nodelist=slurm1 +
- +
-hostname+
 </​code>​ </​code>​
  
-Let's put this in a directory called ''​%%slurm-test%%''​ in our home directory.  ​We run the script with ''​%%sbatch%%''​ and the results will be put in the file we specified with ''​%%--output%%''​. ​ If no output file is specified, output will be saved to a file with the same name as the SLURM jobid.+We run the script with ''​%%sbatch%%''​ and the results will be put in the file we specified with ''​%%--output%%''​. ​ If no output file is specified, output will be saved to a file with the same name as the SLURM jobid.
  
 <​code>​ <​code>​
-ktm5j@power3 ​~ $ cd slurm-test/  +[abc1de@portal04 ​~]sbatch ​slurm.test 
-ktm5j@power3 ~/​slurm-test $ chmod +x test.sh  +Submitted batch job 640768 
-ktm5j@power3 ~/slurm-test $ sbatch test.sh ​ +[abc1de@portal04 ​~]more my_job.output 
-Submitted batch job 466977 +cortado06
-ktm5j@power3 ​~/slurm-test $ ls +
-my_job.err ​ my_job.output ​ test.sh +
-ktm5j@power3 ~/​slurm-test ​cat my_job.output  +
-slurm1+
 </​code>​ </​code>​
  
Line 93: Line 78:
  
 <​code>​ <​code>​
-ktm5j@power3 ​~ $ srun -w slurm[1-5] -N5 hostname+abc1de@portal01 ​~ $ srun -w slurm[1-5] -N5 hostname
 slurm4 slurm4
 slurm1 slurm1
Line 101: Line 86:
 </​code>​ </​code>​
  
-==== Terminating Jobs ====+=== Direct login to servers (without a job script) ​===
  
-Please be aware of jobs you start and make sure that they finish executing. ​ If your job does not converge, it will sit in the queue taking up resources and preventing others from running their jobs.+You can use ''​%%srun%%''​ to login directly to a server controlled by the SLURM job scheduler. ​ This can be useful for debugging purposes as well as running your applications without using a job script. This feature also reserves the server for your exclusive use.  
 + 
 +We must pass the ''​%%--pty%%''​ option to ''​%%srun%%''​ so output is directed to a pseudo-terminal:​ 
 + 
 +<​code>​ 
 +abc1de@portal ~$ srun -w cortado04 --pty bash -i -l - 
 +abc1de@cortado04 ~$ hostname 
 +cortado04 
 +abc1de@cortado04 ~$ 
 +</​code>​ 
 + 
 +The ''​%%-w%%''​ argument selects the server into which to login. The ''​%%-i%%''​ argument tells ''​%%bash%%''​ to run as an interactive shell. ​ The ''​%%-l%%''​ argument instructs bash that this is a login shell, this, along with the final ''​%%-%%''​ are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]] 
 + 
 +If a node is in a partition (see below for partition information) other than the default "​main"​ partition (for example, the "​gpu"​ partition), then you //must// specify the partition in your command, for example: 
 +<​code>​ 
 +abc1de@portal ~$ srun -w lynx05 -p gpu --pty bash -i -l - 
 +</​code>​ 
 + 
 +=== Terminating Jobs === 
 + 
 +Please be aware of jobs you start and make sure that they finish executing. ​ If your job does not exit gracefully, it will continue running on the server, ​taking up resources and preventing others from running their jobs.
  
 To cancel a running job, use the ''​%%scancel [jobid]%%''​ command To cancel a running job, use the ''​%%scancel [jobid]%%''​ command
  
 <​code>​ <​code>​
-ktm5j@power1 ​~ $ squeue+abc1de@portal01 ​~ $ squeue
              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)
-            467039 ​     main    sleep    ​ktm5j  ​R ​      ​0:​06 ​     1 artemis1 ​          <​-- ​ Running job +            467039 ​     main    sleep    ​abc1de ​ ​R ​      ​0:​06 ​     1 artemis1 ​          <​-- ​ Running job 
-ktm5j@power1 ​~ $ scancel 467039+abc1de@portal01 ​~ $ scancel 467039
 </​code>​ </​code>​
  
-==== Partitions ​====+The default signal sent to a running job is SIGTERM (terminate). If you wish to send a different signal to the job's processes (for example, a SIGKILL which is often needed if a SIGTERM doesn'​t terminate the process), use the ''​%%-s%%''​ argument to scancel, i.e.: 
 +<​code>​ 
 +abc1de@portal01 ~ $ scancel --signal=KILL 467039 
 +</​code>​ 
 + 
 + 
 +=== Queues/Partitions ===
  
-Slurm refers to job queues as //​partitions//​. ​ These queues can have unique constraints such as compute nodes, max runtime, resource limits, etc.  There is a ''​%%main%%''​ queue, which will make use of all non-GPU compute nodes.+Slurm refers to job queues as //​partitions//​. ​We group similar systems into separate queues. For example, there is a "​main"​ queue for general purpose systems, and a "​gpu"​ queue for systems with GPUs. These queues can have unique constraints such as compute nodes, max runtime, resource limits, etc.
  
 Partition is indicated by ''​%%-p partname%%''​ or ''​%%--partition partname%%''​. Partition is indicated by ''​%%-p partname%%''​ or ''​%%--partition partname%%''​.
Line 132: Line 143:
 </​code>​ </​code>​
  
-An example running the command ''​%%hostname%%''​ on the //intel// partition, this will run on any node in the partition:+An example running the command ''​%%hostname%%''​ on the //main// partition, this will run on any node in the partition:
  
 <​code>​ <​code>​
-srun -p intel hostname+srun -p main hostname
 </​code>​ </​code>​
  
-===GPUs ====+=== Using GPUs ===
  
 Slurm handles GPUs and other non-CPU computing resources using what are called [[https://​slurm.schedmd.com/​gres.html|GRES]] Resources (Generic Resource). ​ To use the GPU(s) on a system using Slurm, either using ''​%%sbatch%%''​ or ''​%%srun%%'',​ you must request the GPUs using the ''​%%--gres:​x%%''​ option. ​ You must specify the ''​%%gres%%''​ flag followed by ''​%%:​%%''​ and the quantity of resources Slurm handles GPUs and other non-CPU computing resources using what are called [[https://​slurm.schedmd.com/​gres.html|GRES]] Resources (Generic Resource). ​ To use the GPU(s) on a system using Slurm, either using ''​%%sbatch%%''​ or ''​%%srun%%'',​ you must request the GPUs using the ''​%%--gres:​x%%''​ option. ​ You must specify the ''​%%gres%%''​ flag followed by ''​%%:​%%''​ and the quantity of resources
Line 154: Line 165:
 </​code>​ </​code>​
  
-==== Interactive Shell ====+=== Reservations ​===
  
-We can use ''​%%srun%%'' ​to spawn an interactive shell on a SLURM compute node.  ​While this can be useful for debugging purposesthis is **not** how you should typically use the SLURM system. ​ To spawn a shell we must pass the ''​%%--pty%%''​ option to ''​%%srun%%''​ so output is directed to a pseudo-terminal:​+Reservations for specific resources or nodes can be made by submitting a request ​to <​cshelpdesk@virginia.edu>​.  ​For more information about using reservationssee the main article on [[compute_slurm_reservations|SLURM Reservations]]
  
-<​code>​ +=== Note on Modules in Slurm === 
-ktm5j@power3 ~/​slurm-test $ srun -w slurm1 --pty bash -i -l - + 
-ktm5j@slurm1 ~/slurm-test $ hostname +Due to the way sbatch spawns a bash session (non-login session), some init files are not loaded from ''​%%/etc/​profile.d%%''​. ​ This prevents the initialization of the [[linux_environment_modules|Environment Modules]] system and will prevent you from loading software modules. 
-slurm1 + 
-ktm5j@slurm1 ~/slurm-test $+To fix this, simply include the following line in your sbatch scripts: 
 + 
 +<code bash> 
 +source ​/etc/​profile.d/​modules.sh
 </​code>​ </​code>​
  
-The ''​%%-i%%''​ argument tells ''​%%bash%%''​ to run as interactive. ​ The ''​%%-l%%''​ arg instructs bash that this is a login shell, this, along with the final ''​%%-%%''​ are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]] 
  • compute_slurm.1554326667.txt.gz
  • Last modified: 2019/04/03 21:24
  • by ktm5j