This is an old revision of the document!
Scheduling a Job using the SLURM job scheduler
The Computer Science Department uses a “job scheduler” called SLURM. The purpose of a job scheduler such as SLURM is to allocate computational resources to users who submit “jobs” to a queue. The job scheduler looks at the requirements stated in the job script and allocates a server (or servers) to the job which matches the requirements specified in the job script.
Note on Modules in Slurm
Due to the way sbatch spawns a bash session (non-login session), some init files are not loaded from /etc/profile.d
. This prevents the initialization of the Environment Modules system and will prevent you from loading software modules.
To fix this, simply include the following line in your sbatch scripts:
source /etc/profile.d/modules.sh
Using SLURM
Information Gathering
SLURM provides a set of tools to use for interacting with the scheduling system. To view information about compute nodes in the SLURM system, we can use the command sinfo
.
pgh5a@portal01 ~ $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up infinite 37 idle hermes[1-4],artemis[1-7],slurm[1-5],nibbler[1-4],trillian[1-3],granger[1-6],granger[7-8],ai0[1-6] qdata up infinite 8 idle qdata[1-8] qdata-preempt up infinite 8 idle qdata[1-8] falcon up infinite 10 idle falcon[1-10] intel up infinite 24 idle artemis7,slurm[1-5],granger[1-6],granger[7-8],nibbler[1-4],ai0[1-6] amd up infinite 13 idle hermes[1-4],artemis[1-6],trillian[1-3]
With sinfo
we can see a listing of what SLURM calls partitions and a list of nodes associated with these partitions. A partition is a grouping of nodes, for example our main partition is a group of all SLURM nodes that are not reserved and can be used by anyone. Notice that hosts can be listed in several different partitions.. For example, slurm[1-5]
can be found in both the main and intel partitions. Intel is a partition of systems with Intel processors, likewise the hosts in amd have AMD processors.
To view jobs running on the queue, we can use the command squeue
. Say we have submitted one job to the main partition, running squeue
will look like this:
pgh5a@portal01 ~ $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 467039 main sleep pgh5a R 0:06 1 artemis1
and now that a node has been allocated, that node artemis1
will show as alloc in sinfo
pgh5a@portal01 ~ $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up infinite 37 idle hermes[1-4],artemis[2-7],slurm[1-5],nibbler[1-4],trillian[1-3],granger[1-6],granger[7-8],ai0[1-6] main* up infinite 1 alloc artemis1 qdata up infinite 8 idle qdata[1-8] qdata-preempt up infinite 8 idle qdata[1-8] falcon up infinite 10 idle falcon[1-10] intel up infinite 24 idle artemis7,slurm[1-5],granger[1-6],granger[7-8],nibbler[1-4],ai0[1-6] amd up infinite 13 idle hermes[1-4],artemis[1-6],trillian[1-3]
Jobs
To use SLURM resources, you must submit your jobs (program/script/etc.) to the SLURM controller. The controller will then send your job to compute nodes for execution, after which time your results will be returned.
Users can submit SLURM jobs from any of the power servers: power1
-power6
. From a shell, you can submit jobs using the commands srun or sbatch. Let's look at a very simple example script and sbatch
command.
Here is our script, all it does is print the hostname of the server running the script. We must add SBATCH
options to our script to handle various SLURM options.
#!/bin/bash #SBATCH --job-name="Slurm Simple Test Job" #Name of the job which appears in squeue # #SBATCH --mail-type=ALL #SBATCH --mail-user=pgh5a@virginia.edu # #SBATCH --error="my_job.err" # Where to write std err #SBATCH --output="my_job.output" # Where to write stdout #SBATCH --nodelist=slurm1 hostname
Let's put this in a directory called slurm-test
in our home directory. We run the script with sbatch
and the results will be put in the file we specified with --output
. If no output file is specified, output will be saved to a file with the same name as the SLURM jobid.
pgh5a@power3 ~ $ cd slurm-test/ pgh5a@power3 ~/slurm-test $ chmod +x test.sh pgh5a@power3 ~/slurm-test $ sbatch test.sh Submitted batch job 466977 pgh5a@power3 ~/slurm-test $ ls my_job.err my_job.output test.sh pgh5a@power3 ~/slurm-test $ cat my_job.output slurm1
Here is a similar example using srun
running on multiple nodes:
pgh5a@power3 ~ $ srun -w slurm[1-5] -N5 hostname slurm4 slurm1 slurm2 slurm3 slurm5
Terminating Jobs
Please be aware of jobs you start and make sure that they finish executing. If your job does not converge, it will sit in the queue taking up resources and preventing others from running their jobs.
To cancel a running job, use the scancel [jobid]
command
ktm5j@power1 ~ $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 467039 main sleep ktm5j R 0:06 1 artemis1 <-- Running job ktm5j@power1 ~ $ scancel 467039
Partitions
Slurm refers to job queues as partitions. These queues can have unique constraints such as compute nodes, max runtime, resource limits, etc. There is a main
queue, which will make use of all non-GPU compute nodes.
Partition is indicated by -p partname
or --partition partname
.
To specify a partition with sbatch
file:
#SBATCH --partition=gpu
Or from the command line with srun
-p gpu
An example running the command hostname
on the intel partition, this will run on any node in the partition:
srun -p intel hostname
GPUs
Slurm handles GPUs and other non-CPU computing resources using what are called GRES Resources (Generic Resource). To use the GPU(s) on a system using Slurm, either using sbatch
or srun
, you must request the GPUs using the --gres:x
option. You must specify the gres
flag followed by :
and the quantity of resources
Say we want to use 4 GPUs on a system, we would use the following sbatch
option:
#SBATCH --gres=gpu:4
Or from the command line
--gres=gpu:4
Interactive Shell
We can use srun
to spawn an interactive shell on a SLURM compute node. While this can be useful for debugging purposes, this is not how you should typically use the SLURM system. To spawn a shell we must pass the --pty
option to srun
so output is directed to a pseudo-terminal:
ktm5j@power3 ~/slurm-test $ srun -w slurm1 --pty bash -i -l - ktm5j@slurm1 ~/slurm-test $ hostname slurm1 ktm5j@slurm1 ~/slurm-test $
The -i
argument tells bash
to run as interactive. The -l
arg instructs bash that this is a login shell, this, along with the final -
are important to reset environment variables that otherwise might cause issues using Environment Modules
If a node is in a partition other than the default “main” partition (for example, the “gpu” partition), then you must specify the partition in your command, for example:
ktm5j@power3 ~/slurm-test $ srun -w lynx05 -p gpu --pty bash -i -l -
Reservations
Reservations for specific resources or nodes can be made by submitting a request to cshelpdesk@virginia.edu. For more information about using reservations, see the main article on SLURM Reservations