Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
compute_slurm [2020/09/24 13:58]
pgh5a
compute_slurm [2020/10/06 12:59] (current)
pgh5a
Line 29: Line 29:
 <​code>​ <​code>​
 abc1de@portal01 ~ $ squeue abc1de@portal01 ~ $ squeue
-             JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME  NODES NODELIST(REASON) +             JOBID PARTITION ​    ​NAME ​    ​USER ​   ST     ​TIME  NODES NODELIST(REASON) 
-            467039 ​     main    my_job ​   ​pgh5a  ​R ​     0:06      1 artemis1+            467039 ​     main    my_job ​   ​abc1de ​ ​R ​     0:06      1 artemis1
 </​code>​ </​code>​
  
Line 61: Line 61:
 # #
 #SBATCH --mail-type=ALL #SBATCH --mail-type=ALL
-#SBATCH --mail-user=pgh5a@virginia.edu+#SBATCH --mail-user=abc1de@virginia.edu
 # #
 #SBATCH --error="​my_job.err" ​                   # Where to write std err #SBATCH --error="​my_job.err" ​                   # Where to write std err
Line 93: Line 93:
 slurm5 slurm5
 </​code>​ </​code>​
- 
-=== Terminating Jobs === 
- 
-Please be aware of jobs you start and make sure that they finish executing. ​ If your job does not exit gracefully, it will continue running on the server, taking up resources and preventing others from running their jobs. 
- 
-To cancel a running job, use the ''​%%scancel [jobid]%%''​ command 
- 
-<​code>​ 
-abc1de@portal01 ~ $ squeue 
-             JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON) 
-            467039 ​     main    sleep    abc1de ​ R       ​0:​06 ​     1 artemis1 ​          <​-- ​ Running job 
-abc1de@portal01 ~ $ scancel 467039 
-</​code>​ 
- 
 === Direct login to servers (without a job script) === === Direct login to servers (without a job script) ===
  
Line 122: Line 108:
 The ''​%%-w%%''​ argument selects the server into which to login. The ''​%%-i%%''​ argument tells ''​%%bash%%''​ to run as an interactive shell. ​ The ''​%%-l%%''​ argument instructs bash that this is a login shell, this, along with the final ''​%%-%%''​ are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]] The ''​%%-w%%''​ argument selects the server into which to login. The ''​%%-i%%''​ argument tells ''​%%bash%%''​ to run as an interactive shell. ​ The ''​%%-l%%''​ argument instructs bash that this is a login shell, this, along with the final ''​%%-%%''​ are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]]
  
-If a node is in a partition other than the default "​main"​ partition (for example, the "​gpu"​ partition), then you //must// specify the partition in your command, for example:+If a node is in a partition ​(see below for partition information) ​other than the default "​main"​ partition (for example, the "​gpu"​ partition), then you //must// specify the partition in your command, for example:
 <​code>​ <​code>​
 abc1de@portal ~$ srun -w lynx05 -p gpu --pty bash -i -l - abc1de@portal ~$ srun -w lynx05 -p gpu --pty bash -i -l -
 </​code>​ </​code>​
 +
 +=== Terminating Jobs ===
 +
 +Please be aware of jobs you start and make sure that they finish executing. ​ If your job does not exit gracefully, it will continue running on the server, taking up resources and preventing others from running their jobs.
 +
 +To cancel a running job, use the ''​%%scancel [jobid]%%''​ command
 +
 +<​code>​
 +abc1de@portal01 ~ $ squeue
 +             JOBID PARTITION ​    ​NAME ​    USER ST       ​TIME ​ NODES NODELIST(REASON)
 +            467039 ​     main    sleep    abc1de ​ R       ​0:​06 ​     1 artemis1 ​          <​-- ​ Running job
 +abc1de@portal01 ~ $ scancel 467039
 +</​code>​
 +
 +The default signal sent to a running job is SIGTERM (terminate). If you wish to send a different signal to the job's processes (for example, a SIGKILL which is often needed if a SIGTERM doesn'​t terminate the process), use the ''​%%-s%%''​ argument to scancel, i.e.:
 +<​code>​
 +abc1de@portal01 ~ $ scancel --signal=KILL 467039
 +</​code>​
 +
  
 === Queues/​Partitions === === Queues/​Partitions ===
  • compute_slurm.1600955923.txt.gz
  • Last modified: 2020/09/24 13:58
  • by pgh5a