Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
compute_slurm [2020/05/04 14:14]
pgh5a
compute_slurm [2020/05/26 20:24] (current)
pgh5a [Information Gathering]
Line 23: Line 23:
 </​code>​ </​code>​
  
-With ''​%%sinfo%%''​ we can see a listing of the job queues or "​partitions"​ and a list of nodes associated with these partitions. ​ A partition is a grouping of nodes, for example our //main// partition is a group of all SLURM nodes that are not reserved and can be used by anyone, and the //gpu// partition is a group of nodes that each contain GPUs.  Sometimes hosts can be listed in two or more partitions.+With ''​%%sinfo%%''​ we can see a listing of the job queues or "​partitions"​ and a list of nodes associated with these partitions. ​ A partition is a grouping of nodes, for example our //main// partition is a group of all general purpose ​nodes, and the //gpu// partition is a group of nodes that each contain GPUs.  Sometimes hosts can be listed in two or more partitions.
  
 To view jobs running on the queue, we can use the command ''​%%squeue%%''​. ​ Say we have submitted one job to the main partition, running ''​%%squeue%%''​ will look like this: To view jobs running on the queue, we can use the command ''​%%squeue%%''​. ​ Say we have submitted one job to the main partition, running ''​%%squeue%%''​ will look like this:
Line 30: Line 30:
 pgh5a@portal01 ~ $ squeue pgh5a@portal01 ~ $ squeue
              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)
-            467039 ​     main    ​sleep    ​pgh5a ​ R       ​0:06      1 artemis1+            467039 ​     main    ​my_job ​   ​pgh5a ​ R      0:06      1 artemis1
 </​code>​ </​code>​
  
Line 51: Line 51:
 To use SLURM resources, you must submit your jobs (program/​script/​etc.) to the SLURM controller. ​ The controller will then send your job to compute nodes for execution, after which time your results will be returned. To use SLURM resources, you must submit your jobs (program/​script/​etc.) to the SLURM controller. ​ The controller will then send your job to compute nodes for execution, after which time your results will be returned.
  
-Users can submit SLURM jobs from any of the power servers: ​''​%%power1%%''​-''​%%power6%%''​. ​ From a shell, you can submit jobs using the commands [[https://​slurm.schedmd.com/​srun.html|srun]] or [[https://​slurm.schedmd.com/​sbatch.html|sbatch]]. ​ Let's look at a very simple example script and ''​%%sbatch%%''​ command.+Users can submit SLURM jobs from ''​%%portal.cs.virginia.edu%%''​. ​ From a shell, you can submit jobs using the commands [[https://​slurm.schedmd.com/​srun.html|srun]] or [[https://​slurm.schedmd.com/​sbatch.html|sbatch]]. ​ Let's look at a very simple example script and ''​%%sbatch%%''​ command.
  
 Here is our script, all it does is print the hostname of the server running the script. ​ We must add ''​%%SBATCH%%''​ options to our script to handle various SLURM options. Here is our script, all it does is print the hostname of the server running the script. ​ We must add ''​%%SBATCH%%''​ options to our script to handle various SLURM options.
Line 73: Line 73:
  
 <​code>​ <​code>​
-pgh5a@power3 ​~ $ cd slurm-test/  +pgh5a@portal01 ​~ $ cd slurm-test/  
-pgh5a@power3 ​~/​slurm-test $ chmod +x test.sh  +pgh5a@portal01 ​~/​slurm-test $ chmod +x test.sh  
-pgh5a@power3 ​~/​slurm-test $ sbatch test.sh ​+pgh5a@portal01 ​~/​slurm-test $ sbatch test.sh ​
 Submitted batch job 466977 Submitted batch job 466977
-pgh5a@power3 ​~/​slurm-test $ ls+pgh5a@portal01 ​~/​slurm-test $ ls
 my_job.err ​ my_job.output ​ test.sh my_job.err ​ my_job.output ​ test.sh
-pgh5a@power3 ​~/​slurm-test $ cat my_job.output ​+pgh5a@portal01 ​~/​slurm-test $ cat my_job.output ​
 slurm1 slurm1
 </​code>​ </​code>​
Line 86: Line 86:
  
 <​code>​ <​code>​
-pgh5a@power3 ​~ $ srun -w slurm[1-5] -N5 hostname+pgh5a@portal01 ​~ $ srun -w slurm[1-5] -N5 hostname
 slurm4 slurm4
 slurm1 slurm1
Line 101: Line 101:
  
 <​code>​ <​code>​
-ktm5j@power1 ​~ $ squeue+ktm5j@portal01 ​~ $ squeue
              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)
             467039 ​     main    sleep    ktm5j  R       ​0:​06 ​     1 artemis1 ​          <​-- ​ Running job             467039 ​     main    sleep    ktm5j  R       ​0:​06 ​     1 artemis1 ​          <​-- ​ Running job
-ktm5j@power1 ​~ $ scancel 467039+ktm5j@portal01 ​~ $ scancel 467039
 </​code>​ </​code>​
  
Line 125: Line 125:
 </​code>​ </​code>​
  
-An example running the command ''​%%hostname%%''​ on the //intel// partition, this will run on any node in the partition:+An example running the command ''​%%hostname%%''​ on the //main// partition, this will run on any node in the partition:
  
 <​code>​ <​code>​
-srun -p intel hostname+srun -p main hostname
 </​code>​ </​code>​
  
Line 149: Line 149:
 ==== Interactive Shell ==== ==== Interactive Shell ====
  
-We can use ''​%%srun%%''​ to spawn an interactive shell on a SLURM compute node.  ​While this can be useful for debugging purposes, this is **not** how you should typically use the SLURM system To spawn a shell we must pass the ''​%%--pty%%''​ option to ''​%%srun%%''​ so output is directed to a pseudo-terminal:​+We can use ''​%%srun%%''​ to spawn an interactive shell on a server controlled by the SLURM job scheduler.  ​This can be useful for debugging purposes ​as well as running jobs without using a job script. Creating an interactive session also reserves ​the node for your exclusive use 
 + 
 +To spawn a shell we must pass the ''​%%--pty%%''​ option to ''​%%srun%%''​ so output is directed to a pseudo-terminal:​
  
 <​code>​ <​code>​
-ktm5j@power3 ​~/​slurm-test ​$ srun -w slurm1 --pty bash -i -l - +pgh5a@portal ​~$ srun -w slurm1 --pty bash -i -l - 
-ktm5j@slurm1 ~/​slurm-test ​$ hostname+pgh5a@slurm1 ~$ hostname
 slurm1 slurm1
-ktm5j@slurm1 ~/​slurm-test ​$+pgh5a@slurm1 ~$
 </​code>​ </​code>​
  
Line 162: Line 164:
 If a node is in a partition other than the default "​main"​ partition (for example, the "​gpu"​ partition), then you must specify the partition in your command, for example: If a node is in a partition other than the default "​main"​ partition (for example, the "​gpu"​ partition), then you must specify the partition in your command, for example:
 <​code>​ <​code>​
-ktm5j@power3 ​~/​slurm-test ​$ srun -w lynx05 -p gpu --pty bash -i -l -+pgh5a@portal ​~$ srun -w lynx05 -p gpu --pty bash -i -l -
 </​code>​ </​code>​
  
  • compute_slurm.1588601651.txt.gz
  • Last modified: 2020/05/04 14:14
  • by pgh5a