Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
compute_slurm [2020/09/24 15:49]
pgh5a
compute_slurm [2021/04/02 17:05]
pgh5a
Line 8: Line 8:
 **[[https://​slurm.schedmd.com/​pdfs/​summary.pdf| **[[https://​slurm.schedmd.com/​pdfs/​summary.pdf|
 Slurm Commands Cheat Sheet]]** Slurm Commands Cheat Sheet]]**
 +
 +The SLURM commands below are ONLY available on the portal cluster of servers. They are not installed on the gpusrv* or the SLURM controlled nodes themselves.
  
 === Information Gathering === === Information Gathering ===
Line 29: Line 31:
 <​code>​ <​code>​
 abc1de@portal01 ~ $ squeue abc1de@portal01 ~ $ squeue
-             JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME  NODES NODELIST(REASON) +             JOBID PARTITION ​    ​NAME ​    ​USER ​   ST     ​TIME  NODES NODELIST(REASON) 
-            467039 ​     main    my_job ​   ​pgh5a  ​R ​     0:06      1 artemis1+            467039 ​     main    my_job ​   ​abc1de ​ ​R ​     0:06      1 artemis1
 </​code>​ </​code>​
  
Line 49: Line 51:
 === Jobs === === Jobs ===
  
-To use SLURM resources, you must submit your jobs (program/​script/​etc.) to the SLURM controller. ​ The controller will then send your job to compute nodes for execution, after which time your results will be returned.+To use SLURM resources, you must submit your jobs (program/​script/​etc.) to the SLURM controller. ​ The controller will then send your job to compute nodes for execution, after which time your results will be returned. There is also an //direct login option// (see below) that doesn'​t require a job script.
  
-Users can submit SLURM jobs from ''​%%portal.cs.virginia.edu%%''​.  ​From a shell, you can submit jobs using the commands [[https://​slurm.schedmd.com/​srun.html|srun]] or [[https://​slurm.schedmd.com/​sbatch.html|sbatch]]. ​ Let's look at a very simple example script and ''​%%sbatch%%''​ command.+Users can submit SLURM jobs from ''​%%portal.cs.virginia.edu%%''​.  ​You can submit jobs using the commands [[https://​slurm.schedmd.com/​srun.html|srun]] or [[https://​slurm.schedmd.com/​sbatch.html|sbatch]]. ​ Let's look at a very simple example script and ''​%%sbatch%%''​ command.
  
 Here is our script, all it does is print the hostname of the server running the script. ​ We must add ''​%%SBATCH%%''​ options to our script to handle various SLURM options. Here is our script, all it does is print the hostname of the server running the script. ​ We must add ''​%%SBATCH%%''​ options to our script to handle various SLURM options.
Line 57: Line 59:
 <code bash> <code bash>
 #!/bin/bash #!/bin/bash
- +# --- this job will be run on any available node 
-#SBATCH ​--job-name="​Slurm Simple Test Job" #Name of the job which appears in squeue +and simply output the node's hostname to 
-+my_job.output 
-#SBATCH --mail-type=ALL +#SBATCH --job-name="Slurm Simple Test Job" 
-#SBATCH --mail-user=pgh5a@virginia.edu +#SBATCH --error="​my_job.err"​ 
-# +#SBATCH --output="​my_job.output"​ 
-#SBATCH --error="​my_job.err" ​                   # Where to write std err +echo "​$HOSTNAME"​
-#SBATCH --output="​my_job.output" ​               # Where to write stdout +
-#SBATCH --nodelist=slurm1 +
- +
-hostname+
 </​code>​ </​code>​
  
-Let's put this in a directory called ''​%%slurm-test%%''​ in our home directory.  ​We run the script with ''​%%sbatch%%''​ and the results will be put in the file we specified with ''​%%--output%%''​. ​ If no output file is specified, output will be saved to a file with the same name as the SLURM jobid.+We run the script with ''​%%sbatch%%''​ and the results will be put in the file we specified with ''​%%--output%%''​. ​ If no output file is specified, output will be saved to a file with the same name as the SLURM jobid.
  
 <​code>​ <​code>​
-abc1de@portal01 ​~ $ cd slurm-test/  +[abc1de@portal04 ​~]sbatch ​slurm.test 
-abc1de@portal01 ~/​slurm-test $ chmod +x test.sh  +Submitted batch job 640768 
-abc1de@portal01 ~/slurm-test $ sbatch test.sh ​ +[abc1de@portal04 ​~]more my_job.output 
-Submitted batch job 466977 +cortado06
-abc1de@portal01 ​~/slurm-test $ ls +
-my_job.err ​ my_job.output ​ test.sh +
-abc1de@portal01 ~/​slurm-test ​cat my_job.output  +
-slurm1+
 </​code>​ </​code>​
  
Line 93: Line 87:
 slurm5 slurm5
 </​code>​ </​code>​
 +
 === Direct login to servers (without a job script) === === Direct login to servers (without a job script) ===
  
-You can use ''​%%srun%%''​ to login directly to a server controlled by the SLURM job scheduler. ​ This can be useful for debugging purposes as well as running your applications without using a job script. ​Directly logging in also reserves the node for your exclusive use. +You can use ''​%%srun%%''​ to login directly to a server controlled by the SLURM job scheduler. ​ This can be useful for debugging purposes as well as running your applications without using a job script. ​This feature ​also reserves the server ​for your exclusive use. 
  
-To spawn a shell we must pass the ''​%%--pty%%''​ option to ''​%%srun%%''​ so output is directed to a pseudo-terminal:​+We must pass the ''​%%--pty%%''​ option to ''​%%srun%%''​ so output is directed to a pseudo-terminal:​
  
 <​code>​ <​code>​
Line 108: Line 103:
 The ''​%%-w%%''​ argument selects the server into which to login. The ''​%%-i%%''​ argument tells ''​%%bash%%''​ to run as an interactive shell. ​ The ''​%%-l%%''​ argument instructs bash that this is a login shell, this, along with the final ''​%%-%%''​ are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]] The ''​%%-w%%''​ argument selects the server into which to login. The ''​%%-i%%''​ argument tells ''​%%bash%%''​ to run as an interactive shell. ​ The ''​%%-l%%''​ argument instructs bash that this is a login shell, this, along with the final ''​%%-%%''​ are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]]
  
-If a node is in a partition other than the default "​main"​ partition (for example, the "​gpu"​ partition), then you //must// specify the partition in your command, for example:+If a node is in a partition ​(see below for partition information) ​other than the default "​main"​ partition (for example, the "​gpu"​ partition), then you //must// specify the partition in your command, for example:
 <​code>​ <​code>​
 abc1de@portal ~$ srun -w lynx05 -p gpu --pty bash -i -l - abc1de@portal ~$ srun -w lynx05 -p gpu --pty bash -i -l -
Line 124: Line 119:
             467039 ​     main    sleep    abc1de ​ R       ​0:​06 ​     1 artemis1 ​          <​-- ​ Running job             467039 ​     main    sleep    abc1de ​ R       ​0:​06 ​     1 artemis1 ​          <​-- ​ Running job
 abc1de@portal01 ~ $ scancel 467039 abc1de@portal01 ~ $ scancel 467039
 +</​code>​
 +
 +The default signal sent to a running job is SIGTERM (terminate). If you wish to send a different signal to the job's processes (for example, a SIGKILL which is often needed if a SIGTERM doesn'​t terminate the process), use the ''​%%--signal%%''​ argument to scancel, i.e.:
 +<​code>​
 +abc1de@portal01 ~ $ scancel --signal=KILL 467039
 </​code>​ </​code>​
  
  • compute_slurm.txt
  • Last modified: 2022/04/04 14:25
  • (external edit)