Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
compute_slurm [2020/05/26 20:23]
pgh5a
compute_slurm [2020/08/03 13:12]
pgh5a [Terminating Jobs]
Line 23: Line 23:
 </​code>​ </​code>​
  
-With ''​%%sinfo%%''​ we can see a listing of the job queues or "​partitions"​ and a list of nodes associated with these partitions. ​ A partition is a grouping of nodes, for example our //main// partition is a group of all SLURM nodes that are not reserved and can be used by anyone, and the //gpu// partition is a group of nodes that each contain GPUs.  Sometimes hosts can be listed in two or more partitions.+With ''​%%sinfo%%''​ we can see a listing of the job queues or "​partitions"​ and a list of nodes associated with these partitions. ​ A partition is a grouping of nodes, for example our //main// partition is a group of all general purpose ​nodes, and the //gpu// partition is a group of nodes that each contain GPUs.  Sometimes hosts can be listed in two or more partitions.
  
 To view jobs running on the queue, we can use the command ''​%%squeue%%''​. ​ Say we have submitted one job to the main partition, running ''​%%squeue%%''​ will look like this: To view jobs running on the queue, we can use the command ''​%%squeue%%''​. ​ Say we have submitted one job to the main partition, running ''​%%squeue%%''​ will look like this:
Line 30: Line 30:
 pgh5a@portal01 ~ $ squeue pgh5a@portal01 ~ $ squeue
              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)
-            467039 ​     main    ​sleep    ​pgh5a ​ R       ​0:06      1 artemis1+            467039 ​     main    ​my_job ​   ​pgh5a ​ R      0:06      1 artemis1
 </​code>​ </​code>​
  
Line 96: Line 96:
 ==== Terminating Jobs ==== ==== Terminating Jobs ====
  
-Please be aware of jobs you start and make sure that they finish executing. ​ If your job does not converge, it will sit in the queue taking up resources and preventing others from running their jobs.+Please be aware of jobs you start and make sure that they finish executing. ​ If your job does not exit gracefully, it will continue running on the server, ​taking up resources and preventing others from running their jobs.
  
 To cancel a running job, use the ''​%%scancel [jobid]%%''​ command To cancel a running job, use the ''​%%scancel [jobid]%%''​ command
  
 <​code>​ <​code>​
-ktm5j@portal01 ~ $ squeue+abc1de@portal01 ~ $ squeue
              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)              JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)
-            467039 ​     main    sleep    ​ktm5j  ​R ​      ​0:​06 ​     1 artemis1 ​          <​-- ​ Running jobĀ +            467039 ​     main    sleep    ​abc1de ​ ​R ​      ​0:​06 ​     1 artemis1 ​          <​-- ​ Running jobĀ 
-ktm5j@portal01 ~ $ scancel 467039+abc1de@portal01 ~ $ scancel 467039
 </​code>​ </​code>​
  
  • compute_slurm.txt
  • Last modified: 2020/09/24 18:23
  • by pgh5a