Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
compute_slurm [2020/08/27 15:14]
pgh5a
compute_slurm [2020/10/06 12:59] (current)
pgh5a
Line 14: Line 14:
  
 <​code>​ <​code>​
-[pgh5a@portal04 ~]$ sinfo+[abc1de@portal04 ~]$ sinfo
 PARTITION AVAIL  TIMELIMIT ​ NODES  STATE NODELIST PARTITION AVAIL  TIMELIMIT ​ NODES  STATE NODELIST
 main*        up   ​infinite ​     3  drain falcon[3-5] main*        up   ​infinite ​     3  drain falcon[3-5]
Line 28: Line 28:
  
 <​code>​ <​code>​
-pgh5a@portal01 ~ $ squeue +abc1de@portal01 ~ $ squeue 
-             JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME  NODES NODELIST(REASON) +             JOBID PARTITION ​    ​NAME ​    ​USER ​   ST     ​TIME  NODES NODELIST(REASON) 
-            467039 ​     main    my_job ​   ​pgh5a  ​R ​     0:06      1 artemis1+            467039 ​     main    my_job ​   ​abc1de ​ ​R ​     0:06      1 artemis1
 </​code>​ </​code>​
  
Line 36: Line 36:
  
 <​code>​ <​code>​
-pgh5a@portal01 ~ $ sinfo+abc1de@portal01 ~ $ sinfo
 PARTITION ​    ​AVAIL ​ TIMELIMIT ​ NODES  STATE NODELIST PARTITION ​    ​AVAIL ​ TIMELIMIT ​ NODES  STATE NODELIST
 main*            up   ​infinite ​    ​37 ​  idle hermes[1-4],​artemis[2-7],​slurm[1-5],​nibbler[1-4],​trillian[1-3],​granger[1-6],​granger[7-8],​ai0[1-6] main*            up   ​infinite ​    ​37 ​  idle hermes[1-4],​artemis[2-7],​slurm[1-5],​nibbler[1-4],​trillian[1-3],​granger[1-6],​granger[7-8],​ai0[1-6]
Line 61: Line 61:
 # #
 #SBATCH --mail-type=ALL #SBATCH --mail-type=ALL
-#SBATCH --mail-user=pgh5a@virginia.edu+#SBATCH --mail-user=abc1de@virginia.edu
 # #
 #SBATCH --error="​my_job.err" ​                   # Where to write std err #SBATCH --error="​my_job.err" ​                   # Where to write std err
Line 73: Line 73:
  
 <​code>​ <​code>​
-pgh5a@portal01 ~ $ cd slurm-test/  +abc1de@portal01 ~ $ cd slurm-test/  
-pgh5a@portal01 ~/​slurm-test $ chmod +x test.sh  +abc1de@portal01 ~/​slurm-test $ chmod +x test.sh  
-pgh5a@portal01 ~/​slurm-test $ sbatch test.sh ​+abc1de@portal01 ~/​slurm-test $ sbatch test.sh ​
 Submitted batch job 466977 Submitted batch job 466977
-pgh5a@portal01 ~/​slurm-test $ ls+abc1de@portal01 ~/​slurm-test $ ls
 my_job.err ​ my_job.output ​ test.sh my_job.err ​ my_job.output ​ test.sh
-pgh5a@portal01 ~/​slurm-test $ cat my_job.output ​+abc1de@portal01 ~/​slurm-test $ cat my_job.output ​
 slurm1 slurm1
 </​code>​ </​code>​
Line 86: Line 86:
  
 <​code>​ <​code>​
-pgh5a@portal01 ~ $ srun -w slurm[1-5] -N5 hostname+abc1de@portal01 ~ $ srun -w slurm[1-5] -N5 hostname
 slurm4 slurm4
 slurm1 slurm1
Line 92: Line 92:
 slurm3 slurm3
 slurm5 slurm5
 +</​code>​
 +=== Direct login to servers (without a job script) ===
 +
 +You can use ''​%%srun%%''​ to login directly to a server controlled by the SLURM job scheduler. ​ This can be useful for debugging purposes as well as running your applications without using a job script. Directly logging in also reserves the node for your exclusive use. 
 +
 +To spawn a shell we must pass the ''​%%--pty%%''​ option to ''​%%srun%%''​ so output is directed to a pseudo-terminal:​
 +
 +<​code>​
 +abc1de@portal ~$ srun -w cortado04 --pty bash -i -l -
 +abc1de@cortado04 ~$ hostname
 +cortado04
 +abc1de@cortado04 ~$
 +</​code>​
 +
 +The ''​%%-w%%''​ argument selects the server into which to login. The ''​%%-i%%''​ argument tells ''​%%bash%%''​ to run as an interactive shell. ​ The ''​%%-l%%''​ argument instructs bash that this is a login shell, this, along with the final ''​%%-%%''​ are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]]
 +
 +If a node is in a partition (see below for partition information) other than the default "​main"​ partition (for example, the "​gpu"​ partition), then you //must// specify the partition in your command, for example:
 +<​code>​
 +abc1de@portal ~$ srun -w lynx05 -p gpu --pty bash -i -l -
 </​code>​ </​code>​
  
Line 106: Line 125:
 abc1de@portal01 ~ $ scancel 467039 abc1de@portal01 ~ $ scancel 467039
 </​code>​ </​code>​
 +
 +The default signal sent to a running job is SIGTERM (terminate). If you wish to send a different signal to the job's processes (for example, a SIGKILL which is often needed if a SIGTERM doesn'​t terminate the process), use the ''​%%-s%%''​ argument to scancel, i.e.:
 +<​code>​
 +abc1de@portal01 ~ $ scancel --signal=KILL 467039
 +</​code>​
 +
  
 === Queues/​Partitions === === Queues/​Partitions ===
  
-Slurm refers to job queues as //​partitions//​. ​ These queues can have unique constraints such as compute nodes, max runtime, resource limits, etc.  There is a ''​%%main%%''​ queue, which will make use of all non-GPU compute nodes.+Slurm refers to job queues as //​partitions//​. ​We group similar systems into separate queues. For example, there is a "​main"​ queue for general purpose systems, and a "​gpu"​ queue for systems with GPUs. These queues can have unique constraints such as compute nodes, max runtime, resource limits, etc.
  
 Partition is indicated by ''​%%-p partname%%''​ or ''​%%--partition partname%%''​. Partition is indicated by ''​%%-p partname%%''​ or ''​%%--partition partname%%''​.
Line 145: Line 170:
 <​code>​ <​code>​
 --gres=gpu:​4 --gres=gpu:​4
-</​code>​ 
- 
-=== Direct login to servers (without a job script) === 
- 
-You can use ''​%%srun%%''​ to login directly to a server controlled by the SLURM job scheduler. ​ This can be useful for debugging purposes as well as running your applications without using a job script. Directly logging in also reserves the node for your exclusive use.  
- 
-To spawn a shell we must pass the ''​%%--pty%%''​ option to ''​%%srun%%''​ so output is directed to a pseudo-terminal:​ 
- 
-<​code>​ 
-abc1de@portal ~$ srun -w cortado04 --pty bash -i -l - 
-abc1de@cortado04 ~$ hostname 
-cortado04 
-abc1de@cortado04 ~$ 
-</​code>​ 
- 
-The ''​%%-w%%''​ argument selects the server into which to login. The ''​%%-i%%''​ argument tells ''​%%bash%%''​ to run as interactive. ​ The ''​%%-l%%''​ argument instructs bash that this is a login shell, this, along with the final ''​%%-%%''​ are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]] 
- 
-If a node is in a partition other than the default "​main"​ partition (for example, the "​gpu"​ partition), then you must specify the partition in your command, for example: 
-<​code>​ 
-pgh5a@portal ~$ srun -w lynx05 -p gpu --pty bash -i -l - 
 </​code>​ </​code>​
  
  • compute_slurm.1598541292.txt.gz
  • Last modified: 2020/08/27 15:14
  • by pgh5a