Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
compute_slurm [2020/04/30 14:58] pgh5a |
compute_slurm [2021/01/20 14:15] pgh5a |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Scheduling a Job using the SLURM job scheduler ====== | + | ==== Scheduling a Job using the SLURM job scheduler ==== |
{{ ::introtoslurm.pdf | Intro to SLURM slides }} | {{ ::introtoslurm.pdf | Intro to SLURM slides }} | ||
- | The Computer Science Department uses a resource management system called [[https://en.wikipedia.org/wiki/Slurm_Workload_Manager|SLURM]]. The purpose of a workload scheduler such as SLURM is to allocate computational resources to our users in such a way that everyone gets their fair share of execution time. | + | The Computer Science Department uses a "job scheduler" called [[https://en.wikipedia.org/wiki/Slurm_Workload_Manager|SLURM]]. The purpose of a job scheduler is to allocate computational resources (servers) to users who submit "jobs" to a queue. The job scheduler looks at the requirements stated in the job's script and allocates to the job a server (or servers) which matches the requirements specified in the job script. For example, if the job script specifies that a job needs 192GB of memory, the job scheduler will find a server with at least that much memory free. |
- | ===== Note on Modules in Slurm ===== | + | === Using SLURM === |
- | + | ||
- | Due to the way sbatch spawns a bash session (non-login session), some init files are not loaded from ''%%/etc/profile.d%%''. This prevents the initialization of the [[linux_environment_modules|Environment Modules]] system and will prevent you from loading software modules. | + | |
- | + | ||
- | To fix this, simply include the following line in your sbatch scripts: | + | |
- | + | ||
- | <code bash> | + | |
- | source /etc/profile.d/modules.sh | + | |
- | </code> | + | |
- | + | ||
- | ===== Using SLURM ===== | + | |
**[[https://slurm.schedmd.com/pdfs/summary.pdf| | **[[https://slurm.schedmd.com/pdfs/summary.pdf| | ||
Slurm Commands Cheat Sheet]]** | Slurm Commands Cheat Sheet]]** | ||
- | ==== Information Gathering ==== | + | === Information Gathering === |
- | SLURM provides a set of tools to use for interacting with the scheduling system. To view information about compute nodes in the SLURM system, we can use the command ''%%sinfo%%''. | + | To view information about compute nodes in the SLURM system, use the command ''%%sinfo%%''. |
<code> | <code> | ||
- | pgh5a@portal01 ~ $ sinfo | + | [abc1de@portal04 ~]$ sinfo |
- | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | + | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST |
- | main* up infinite 37 idle hermes[1-4],artemis[1-7],slurm[1-5],nibbler[1-4],trillian[1-3],granger[1-6],granger[7-8],ai0[1-6] | + | main* up infinite 3 drain falcon[3-5] |
- | qdata up infinite 8 idle qdata[1-8] | + | main* up infinite 27 idle cortado[01-10],falcon[1-2,6-10],lynx[08-12],slurm[1-5] |
- | qdata-preempt up infinite 8 idle qdata[1-8] | + | gpu up infinite 4 mix ai[02-03,05],lynx07 |
- | falcon up infinite 10 idle falcon[1-10] | + | gpu up infinite 1 alloc ai04 |
- | intel up infinite 24 idle artemis7,slurm[1-5],granger[1-6],granger[7-8],nibbler[1-4],ai0[1-6] | + | gpu up infinite 12 idle ai[01,06],lynx[01-06],ristretto[01-04] |
- | amd up infinite 13 idle hermes[1-4],artemis[1-6],trillian[1-3] | + | |
</code> | </code> | ||
- | With ''%%sinfo%%'' we can see a listing of what SLURM calls partitions and a list of nodes associated with these partitions. A partition is a grouping of nodes, for example our //main// partition is a group of all SLURM nodes that are not reserved and can be used by anyone. Notice that hosts can be listed in several different partitions.. For example, ''%%slurm[1-5]%%'' can be found in both the //main// and //intel// partitions. //Intel// is a partition of systems with Intel processors, likewise the hosts in //amd// have AMD processors. | + | With ''%%sinfo%%'' we can see a listing of the job queues or "partitions" and a list of nodes associated with these partitions. A partition is a grouping of nodes, for example our //main// partition is a group of all general purpose nodes, and the //gpu// partition is a group of nodes that each contain GPUs. Sometimes hosts can be listed in two or more partitions. |
To view jobs running on the queue, we can use the command ''%%squeue%%''. Say we have submitted one job to the main partition, running ''%%squeue%%'' will look like this: | To view jobs running on the queue, we can use the command ''%%squeue%%''. Say we have submitted one job to the main partition, running ''%%squeue%%'' will look like this: | ||
<code> | <code> | ||
- | ktm5j@power1 ~ $ squeue | + | abc1de@portal01 ~ $ squeue |
- | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | + | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) |
- | 467039 main sleep ktm5j R 0:06 1 artemis1 | + | 467039 main my_job abc1de R 0:06 1 artemis1 |
</code> | </code> | ||
Line 47: | Line 36: | ||
<code> | <code> | ||
- | ktm5j@power1 ~ $ sinfo | + | abc1de@portal01 ~ $ sinfo |
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
main* up infinite 37 idle hermes[1-4],artemis[2-7],slurm[1-5],nibbler[1-4],trillian[1-3],granger[1-6],granger[7-8],ai0[1-6] | main* up infinite 37 idle hermes[1-4],artemis[2-7],slurm[1-5],nibbler[1-4],trillian[1-3],granger[1-6],granger[7-8],ai0[1-6] | ||
Line 58: | Line 47: | ||
</code> | </code> | ||
- | ==== Jobs ==== | + | === Jobs === |
To use SLURM resources, you must submit your jobs (program/script/etc.) to the SLURM controller. The controller will then send your job to compute nodes for execution, after which time your results will be returned. | To use SLURM resources, you must submit your jobs (program/script/etc.) to the SLURM controller. The controller will then send your job to compute nodes for execution, after which time your results will be returned. | ||
- | Users can submit SLURM jobs from any of the power servers: ''%%power1%%''-''%%power6%%''. From a shell, you can submit jobs using the commands [[https://slurm.schedmd.com/srun.html|srun]] or [[https://slurm.schedmd.com/sbatch.html|sbatch]]. Let's look at a very simple example script and ''%%sbatch%%'' command. | + | Users can submit SLURM jobs from ''%%portal.cs.virginia.edu%%''. From a shell, you can submit jobs using the commands [[https://slurm.schedmd.com/srun.html|srun]] or [[https://slurm.schedmd.com/sbatch.html|sbatch]]. Let's look at a very simple example script and ''%%sbatch%%'' command. |
Here is our script, all it does is print the hostname of the server running the script. We must add ''%%SBATCH%%'' options to our script to handle various SLURM options. | Here is our script, all it does is print the hostname of the server running the script. We must add ''%%SBATCH%%'' options to our script to handle various SLURM options. | ||
Line 68: | Line 57: | ||
<code bash> | <code bash> | ||
#!/bin/bash | #!/bin/bash | ||
- | + | # --- this job will be run on any available node | |
- | #SBATCH --job-name="Slurm Simple Test Job" #Name of the job which appears in squeue | + | # and simply output the node's hostname to |
- | # | + | # my_job.output |
- | #SBATCH --mail-type=ALL | + | #SBATCH --job-name="Slurm Simple Test Job" |
- | #SBATCH --mail-user=ktm5j@virginia.edu | + | #SBATCH --error="my_job.err" |
- | # | + | #SBATCH --output="my_job.output" |
- | #SBATCH --error="my_job.err" # Where to write std err | + | echo "$HOSTNAME" |
- | #SBATCH --output="my_job.output" # Where to write stdout | + | |
- | #SBATCH --nodelist=slurm1 | + | |
- | + | ||
- | hostname | + | |
</code> | </code> | ||
- | Let's put this in a directory called ''%%slurm-test%%'' in our home directory. We run the script with ''%%sbatch%%'' and the results will be put in the file we specified with ''%%--output%%''. If no output file is specified, output will be saved to a file with the same name as the SLURM jobid. | + | We run the script with ''%%sbatch%%'' and the results will be put in the file we specified with ''%%--output%%''. If no output file is specified, output will be saved to a file with the same name as the SLURM jobid. |
<code> | <code> | ||
- | ktm5j@power3 ~ $ cd slurm-test/ | + | [abc1de@portal04 ~]$ sbatch slurm.test |
- | ktm5j@power3 ~/slurm-test $ chmod +x test.sh | + | Submitted batch job 640768 |
- | ktm5j@power3 ~/slurm-test $ sbatch test.sh | + | [abc1de@portal04 ~]$ more my_job.output |
- | Submitted batch job 466977 | + | cortado06 |
- | ktm5j@power3 ~/slurm-test $ ls | + | |
- | my_job.err my_job.output test.sh | + | |
- | ktm5j@power3 ~/slurm-test $ cat my_job.output | + | |
- | slurm1 | + | |
</code> | </code> | ||
Line 97: | Line 78: | ||
<code> | <code> | ||
- | ktm5j@power3 ~ $ srun -w slurm[1-5] -N5 hostname | + | abc1de@portal01 ~ $ srun -w slurm[1-5] -N5 hostname |
slurm4 | slurm4 | ||
slurm1 | slurm1 | ||
Line 105: | Line 86: | ||
</code> | </code> | ||
- | ==== Terminating Jobs ==== | + | === Direct login to servers (without a job script) === |
- | Please be aware of jobs you start and make sure that they finish executing. If your job does not converge, it will sit in the queue taking up resources and preventing others from running their jobs. | + | You can use ''%%srun%%'' to login directly to a server controlled by the SLURM job scheduler. This can be useful for debugging purposes as well as running your applications without using a job script. This feature also reserves the server for your exclusive use. |
+ | |||
+ | We must pass the ''%%--pty%%'' option to ''%%srun%%'' so output is directed to a pseudo-terminal: | ||
+ | |||
+ | <code> | ||
+ | abc1de@portal ~$ srun -w cortado04 --pty bash -i -l - | ||
+ | abc1de@cortado04 ~$ hostname | ||
+ | cortado04 | ||
+ | abc1de@cortado04 ~$ | ||
+ | </code> | ||
+ | |||
+ | The ''%%-w%%'' argument selects the server into which to login. The ''%%-i%%'' argument tells ''%%bash%%'' to run as an interactive shell. The ''%%-l%%'' argument instructs bash that this is a login shell, this, along with the final ''%%-%%'' are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]] | ||
+ | |||
+ | If a node is in a partition (see below for partition information) other than the default "main" partition (for example, the "gpu" partition), then you //must// specify the partition in your command, for example: | ||
+ | <code> | ||
+ | abc1de@portal ~$ srun -w lynx05 -p gpu --pty bash -i -l - | ||
+ | </code> | ||
+ | |||
+ | === Terminating Jobs === | ||
+ | |||
+ | Please be aware of jobs you start and make sure that they finish executing. If your job does not exit gracefully, it will continue running on the server, taking up resources and preventing others from running their jobs. | ||
To cancel a running job, use the ''%%scancel [jobid]%%'' command | To cancel a running job, use the ''%%scancel [jobid]%%'' command | ||
<code> | <code> | ||
- | ktm5j@power1 ~ $ squeue | + | abc1de@portal01 ~ $ squeue |
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
- | 467039 main sleep ktm5j R 0:06 1 artemis1 <-- Running job | + | 467039 main sleep abc1de R 0:06 1 artemis1 <-- Running job |
- | ktm5j@power1 ~ $ scancel 467039 | + | abc1de@portal01 ~ $ scancel 467039 |
</code> | </code> | ||
- | ==== Partitions ==== | + | The default signal sent to a running job is SIGTERM (terminate). If you wish to send a different signal to the job's processes (for example, a SIGKILL which is often needed if a SIGTERM doesn't terminate the process), use the ''%%-s%%'' argument to scancel, i.e.: |
+ | <code> | ||
+ | abc1de@portal01 ~ $ scancel --signal=KILL 467039 | ||
+ | </code> | ||
+ | |||
+ | |||
+ | === Queues/Partitions === | ||
- | Slurm refers to job queues as //partitions//. These queues can have unique constraints such as compute nodes, max runtime, resource limits, etc. There is a ''%%main%%'' queue, which will make use of all non-GPU compute nodes. | + | Slurm refers to job queues as //partitions//. We group similar systems into separate queues. For example, there is a "main" queue for general purpose systems, and a "gpu" queue for systems with GPUs. These queues can have unique constraints such as compute nodes, max runtime, resource limits, etc. |
Partition is indicated by ''%%-p partname%%'' or ''%%--partition partname%%''. | Partition is indicated by ''%%-p partname%%'' or ''%%--partition partname%%''. | ||
Line 136: | Line 143: | ||
</code> | </code> | ||
- | An example running the command ''%%hostname%%'' on the //intel// partition, this will run on any node in the partition: | + | An example running the command ''%%hostname%%'' on the //main// partition, this will run on any node in the partition: |
<code> | <code> | ||
- | srun -p intel hostname | + | srun -p main hostname |
</code> | </code> | ||
- | ==== GPUs ==== | + | === Using GPUs === |
Slurm handles GPUs and other non-CPU computing resources using what are called [[https://slurm.schedmd.com/gres.html|GRES]] Resources (Generic Resource). To use the GPU(s) on a system using Slurm, either using ''%%sbatch%%'' or ''%%srun%%'', you must request the GPUs using the ''%%--gres:x%%'' option. You must specify the ''%%gres%%'' flag followed by ''%%:%%'' and the quantity of resources | Slurm handles GPUs and other non-CPU computing resources using what are called [[https://slurm.schedmd.com/gres.html|GRES]] Resources (Generic Resource). To use the GPU(s) on a system using Slurm, either using ''%%sbatch%%'' or ''%%srun%%'', you must request the GPUs using the ''%%--gres:x%%'' option. You must specify the ''%%gres%%'' flag followed by ''%%:%%'' and the quantity of resources | ||
Line 158: | Line 165: | ||
</code> | </code> | ||
- | ==== Interactive Shell ==== | + | === Reservations === |
- | We can use ''%%srun%%'' to spawn an interactive shell on a SLURM compute node. While this can be useful for debugging purposes, this is **not** how you should typically use the SLURM system. To spawn a shell we must pass the ''%%--pty%%'' option to ''%%srun%%'' so output is directed to a pseudo-terminal: | + | Reservations for specific resources or nodes can be made by submitting a request to <cshelpdesk@virginia.edu>. For more information about using reservations, see the main article on [[compute_slurm_reservations|SLURM Reservations]] |
- | <code> | + | === Note on Modules in Slurm === |
- | ktm5j@power3 ~/slurm-test $ srun -w slurm1 --pty bash -i -l - | + | |
- | ktm5j@slurm1 ~/slurm-test $ hostname | + | |
- | slurm1 | + | |
- | ktm5j@slurm1 ~/slurm-test $ | + | |
- | </code> | + | |
- | The ''%%-i%%'' argument tells ''%%bash%%'' to run as interactive. The ''%%-l%%'' arg instructs bash that this is a login shell, this, along with the final ''%%-%%'' are important to reset environment variables that otherwise might cause issues using [[linux_environment_modules|Environment Modules]] | + | Due to the way sbatch spawns a bash session (non-login session), some init files are not loaded from ''%%/etc/profile.d%%''. This prevents the initialization of the [[linux_environment_modules|Environment Modules]] system and will prevent you from loading software modules. |
- | If a node is in a partition other than the default "main" partition (for example, the "gpu" partition), then you must specify the partition in your command, for example: | + | To fix this, simply include the following line in your sbatch scripts: |
- | <code> | + | |
- | ktm5j@power3 ~/slurm-test $ srun -w lynx05 -p gpu --pty bash -i -l - | + | <code bash> |
+ | source /etc/profile.d/modules.sh | ||
</code> | </code> | ||
- | |||
- | ===== Reservations ===== | ||
- | |||
- | Reservations for specific resources or nodes can be made by submitting a request to <cshelpdesk@virginia.edu>. For more information about using reservations, see the main article on [[compute_slurm_reservations|SLURM Reservations]] | ||