compute_slurm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
compute_slurm [2020/08/26 20:59] – [Information Gathering] pgh5a | compute_slurm [2025/05/01 11:49] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== CS Resource Manager (Job Scheduler) ====== | ||
+ | ===== General Information ===== | ||
+ | The UVA Computer Science department utilizes // | ||
+ | |||
+ | Slurm acts as the "job scheduler" | ||
+ | |||
+ | ==== Terminology & Important General Information ==== | ||
+ | * Servers managed by the **slurm scheduler** are referred to as **nodes**, **slurm nodes**, or **compute nodes** | ||
+ | |||
+ | * A collection of nodes controlled by the **slurm scheduler** is referred to as a **cluster** | ||
+ | |||
+ | * **tasks** in slurm can be considered as individual processes | ||
+ | |||
+ | * **CPUs** in slurm can be considered as individual cores on a processor | ||
+ | * For a job that allocates a single CPU, it is a single process program within a single task | ||
+ | * For a job that allocates multiple CPUs on the same node, it is a multicore program within a single task | ||
+ | * For a job that allocates CPU(s) and multiple nodes (distributed program), then a task will be run on each node | ||
+ | |||
+ | * **GPUs** in slurm are referred to as a Generic Resource (**GRES**) | ||
+ | * Using a specific **GRES** requires specifying the string associated to the GRES | ||
+ | * For example, using '' | ||
+ | * Using '' | ||
+ | |||
+ | * **SSH logins to slurm nodes is disabled**. Interactive jobs are required for accessing a server' | ||
+ | | ||
+ | ==== Environment Overview ==== | ||
+ | The slurm scheduler runs as a service or daemon process named '' | ||
+ | |||
+ | Head login nodes such as '' | ||
+ | |||
+ | Software modules are made available throughout the CS environment, | ||
+ | |||
+ | {{wiki: | ||
+ | ---- | ||
+ | |||
+ | |||
+ | ===== Updates & Announcements ===== | ||
+ | * **January 2025** | ||
+ | * Resource restrictions for each partition have been added, replacing a concurrent running job limit. Further, jobs are now assigned a default time limit | ||
+ | * See the ([[compute_slurm# | ||
+ | |||
+ | * Reservations may circumvent this limit by using the QoS '' | ||
+ | |||
+ | * Jobs that do not specify a time limit with '' | ||
+ | * See the ([[compute_slurm# | ||
+ | |||
+ | * **Nodes Added** | ||
+ | * A node with 126 cores (AMD Epyc 9534), 500GBs of memory, and 1x Nvidia H100 NVL 94GB GPU | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== Resources Available ===== | ||
+ | The tables below describe the resources available by partition name in the CS slurm cluster. | ||
+ | |||
+ | ==== cpu Partition Nodes ==== | ||
+ | ^ #Nodes ^ CPUs/Node ^ Sockets ^ Mem(GBs)/ | ||
+ | | 6 | 62 | 2 | 1500 | | | ||
+ | | 1 | 14 | 1 | 96 | | | ||
+ | | 1 | 62 | 1 | 122 | | | ||
+ | | 5 | 14 | 1 | 125 | | | ||
+ | | 9 | 26 | 1 | 125 | | | ||
+ | | 1 | 14 | 1 | 500 | | | ||
+ | | 1 | 22 | 1 | 250 | | | ||
+ | | 1 | 22 | 1 | 500 | | | ||
+ | | 3 | 26 | 1 | 500 | | | ||
+ | | 2 | 30 | 2 | 62 | | | ||
+ | | 1 | 30 | 1 | 125 | | | ||
+ | | 2 | 30 | 2 | 125 | | | ||
+ | | 10 | 46 | 2 | 500 | | | ||
+ | | 1 | 62 | 2 | 250 | | | ||
+ | | 1 | 158 | 2 | 252 | | | ||
+ | |||
+ | ==== gpu Partition Nodes ==== | ||
+ | All available GPU cards are manufactured by **Nvidia**. No **AMD** GPUs are available in slurm. | ||
+ | |||
+ | ^ #Nodes ^ CPUs/Node ^ Mem(GBs)/ | ||
+ | | 4 | 30 | 125 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti | | ||
+ | | 3 | 30 | 60 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti | | ||
+ | | 1 | 30 | 60 | GeForce GTX 1080Ti \\ GeForce GTX 1080 | 1 \\ 2 | 11 \\ 8 | gtx_1080ti \\ gtx_1080 | | ||
+ | | 1 | 30 | 60 | Titan X \\ GeForce GTX 1080 | 3 \\ 1 | 12 \\ 8 | titan_x \\ gtx_1080 | | ||
+ | | 1 | 30 | 60 | Titan X | 4 | 12 | titan_x | | ||
+ | | 3 | 30 | 60 | Tesla P100 | 4 | 16 | tesla_p100 | | ||
+ | | 2 | 70 | 1000 | GeForce RTX 2080ti | 2 | 11 | rtx_2080ti | | ||
+ | | 6 | 30 | 1000 | Quadro RTX 4000 | 4 | 8 | rtx_4000 | | ||
+ | | 1 | 14 | 250 | Quadro RTX 4000 | 4 | 8 | rtx_4000 | | ||
+ | | 1 | 78 | 250 | Quadro RTX 6000 | 8 | 24 | rtx_6000 | | ||
+ | | 2 | 38 | 500 | RTX A4000 | 4 | 16 | a4000 | | ||
+ | | 1 | 222 | 1000 | RTX A4500 | 8 | 20 | a4500 \\ amd_epyc_7663 | | ||
+ | | 1 | 30 | 1000 | A16 | 8 | 16 | a16 | | ||
+ | | 1 | 46 | 122 | A40 | 2 | 48 | a40 | | ||
+ | | 1 | 62 | 1000 | A40 **(1)*** | 4 | 48 | a40 \\ nvlink | | ||
+ | | 1 | 62 | 250 | A40 **(1)*** | 4 | 48 | a40 \\ nvlink | | ||
+ | | 1 | 28 | 250 | A100 | 4 | 40 | a100_40gb \\ amd_epyc_7252 | | ||
+ | | 1 | 254 | 1000 | A100 **(2)*** | 4 | 80 | a100_80gb \\ nvlink \\ amd_epyc_7742 | | ||
+ | | 1 | 126 | 500 | H100 | 1 | 94 | h100_94gb \\ amd_epyc_9534 | | ||
+ | |||
+ | ==== nolim Partition Nodes ==== | ||
+ | ^ #Nodes ^ CPUs/Node ^ Sockets ^ Mem(GBs)/ | ||
+ | | 1 | 38 | 2 | 160 | | | ||
+ | | 1 | 6 | 1 | 62 | | | ||
+ | |||
+ | ==== gnolim Partition Nodes ==== | ||
+ | All available GPU cards are manufactured by **Nvidia**. | ||
+ | |||
+ | ^ #Nodes ^ CPUs/Node ^ Mem(GBs)/ | ||
+ | | 2 | 30 | 125 | GeForce GTX 1080 | 4 | 8 | gtx_1080 | | ||
+ | | 2 | 22 | 220 | GeForce GTX 1080 | 2 | 8 | gtx_1080 | | ||
+ | | 5 | 30 | 62 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti | | ||
+ | | 2 | 30 | 125 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti | | ||
+ | | 1 | 30 | 112 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti | | ||
+ | | 2 | 22 | 246 | Titan X | 1 | 12 | titan_x | | ||
+ | | 2 | 18 | 58 | Titan X | 1 | 12 | titan_x | | ||
+ | |||
+ | * Listed features can be used with the '' | ||
+ | * For example, '' | ||
+ | * (1)* Scaled to 192GB by pairing GPUs with NVLink technology that provides a maximum bi-directional bandwidth of 112GB/s between the paired A40s | ||
+ | * (2)* Scaled to 320GB by aggregating all four GPUs with NVLink technology | ||
+ | |||
+ | |||
+ | ==== Resource Limits ==== | ||
+ | Any number of jobs can be submitted to the cluster, however, concurrent running for any user's jobs will be restricted by the resource limits listed below for each partition. Note, this does not apply to time limits. | ||
+ | |||
+ | In other words, a user can only have allocated at most the amount(s) listed below. Jobs that would use more than currently allocated resources for running jobs will simply be queued until other jobs complete. | ||
+ | |||
+ | Regarding **Default Time Limits**, these are set when jobs do not specify a time limit with the '' | ||
+ | |||
+ | ^ Partition ^ Max CPU Cores ^ Max Memory ^ Max GPUs ^ Default Time Limit ^ Max Time Limit ^ | ||
+ | | cpu | 400 | 4TBs | | 2 Hours | 4 days | | ||
+ | | gpu | 400 | 4TBs | 40 | 2 Hours | 4 days | | ||
+ | | nolim | 80 | 1TB | | 8 Hours | 20 days | | ||
+ | | gnolim | 80 | 1TB | 20 | 8 Hours | 20 days | | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== Information Gathering ===== | ||
+ | Slurm produces a significant amount of information regarding node statuses and job statuses. These are a few of the commonly used tools for querying job data. | ||
+ | |||
+ | ==== Viewing Partitions ==== | ||
+ | Quick overview of all partitions and all nodes and respective statuses. Include '' | ||
+ | < | ||
+ | ~$ sinfo | ||
+ | ~$ sinfo -l | ||
+ | </ | ||
+ | |||
+ | Take note of the **TIMELIMIT** column. This describes that maximum **WALL** time that a job may **run** for on a given node. It is formatted as '' | ||
+ | < | ||
+ | ~$ sinfo | ||
+ | PARTITION | ||
+ | gpu up 4-00: | ||
+ | ... | ||
+ | </ | ||
+ | This shows a maximum WALL time of 4 days for any job run in the **gpu** partition. | ||
+ | |||
+ | |||
+ | ==== Viewing Job Queues ==== | ||
+ | To display all jobs that are running, queued, or in another state such as pending (PD) | ||
+ | < | ||
+ | ~$ squeue | ||
+ | </ | ||
+ | |||
+ | To display your jobs that are running, queued, or in another state such as pending (PD) (replace < | ||
+ | < | ||
+ | ~$ squeue -u < | ||
+ | </ | ||
+ | |||
+ | ==== Node Status ==== | ||
+ | A full list of node states and symbol usage can be found on the **__[[https:// | ||
+ | |||
+ | At times, a reason can be viewed for why a node is in a certain state such as **DOWN** or **DRAINING/ | ||
+ | < | ||
+ | ~$ sinfo -R | ||
+ | OR | ||
+ | ~$ sinfo --list-reasons | ||
+ | REASON | ||
+ | Not responding | ||
+ | Server Upgrade | ||
+ | </ | ||
+ | |||
+ | To view a specific node for all details including a state reason | ||
+ | < | ||
+ | ~$ scontrol show node < | ||
+ | </ | ||
+ | |||
+ | ==== Job Status ==== | ||
+ | A full list of job states can be found on the **__[[https:// | ||
+ | |||
+ | After submitting a job, be sure to check the state the job enters into. A job will likely enter a **PENDING (PD)** state while waiting for resources to become available, and then begin **RUNNING (R)** (replace < | ||
+ | < | ||
+ | ~$ squeue -u < | ||
+ | JOBID PARTITION | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ==== Individual Node Details (scontrol) ==== | ||
+ | Most information found here can be found via the **SINFO** command and resources table. | ||
+ | |||
+ | However, to view full details about a given node in the cluster including node state, reasons, resources allocated/ | ||
+ | < | ||
+ | ~$ scontrol show node < | ||
+ | NodeName=node0 Arch=x86_64 CoresPerSocket=8 | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ==== Job Accounting ==== | ||
+ | A distinction between active (running) and completed jobs is made as information for recently completed jobs is only available to '' | ||
+ | |||
+ | === Active & Completed Jobs === | ||
+ | Utilize the command '' | ||
+ | |||
+ | Note, time output is formatted as '' | ||
+ | |||
+ | To query for all of your recently completed or actively running jobs | ||
+ | < | ||
+ | ~$ sacct -o " | ||
+ | JobID | ||
+ | ------------ ---------- ---------- -------- ---------- | ||
+ | 1234 myjob RUNNING | ||
+ | 1234.0 | ||
+ | </ | ||
+ | |||
+ | To query for a single completed or active job, include '' | ||
+ | < | ||
+ | ~$ sacct -j < | ||
+ | | ||
+ | ---------- ---------- -------- ---------- | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | There are no built in methods in slurm to easily check GPU utilization metrics. Instead the following can be done to view GPU usage for an actively running job | ||
+ | < | ||
+ | ~$ srun --pty --overlap --jobid=< | ||
+ | </ | ||
+ | |||
+ | |||
+ | === Completed Jobs === | ||
+ | To view utilization details of a completed job such as CPU and RAM utilization | ||
+ | < | ||
+ | ~$ seff < | ||
+ | |||
+ | Job ID: 123456 | ||
+ | Cluster: cs | ||
+ | User/Group: userid/ | ||
+ | State: COMPLETED (exit code 0) | ||
+ | Nodes: 1 | ||
+ | Cores per node: 2 | ||
+ | CPU Utilized: 00:00:00 | ||
+ | CPU Efficiency: 0.00% of 00:01:42 core-walltime | ||
+ | Job Wall-clock time: 00:00:51 | ||
+ | Memory Utilized: 4.88 MB | ||
+ | Memory Efficiency: 0.06% of 8.00 GB | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== Reservations ===== | ||
+ | Slurm reservations are made available when dedicated usage is needed for a subset of nodes and their respective resources from the cluster. | ||
+ | |||
+ | Computing resources are finite, and there is considerable demand for nodes, especially the higher end GPU nodes. __Reserving a node means that you are not allowing anyone else to use the node, denying that resource to others__. So if the support staff sees that a reserved node has sat idle for a few hours, they will delete the reservation. **As a result, reservations are generally discouraged unless one is absolutely necessary, and all resources will be used throughout its duration**. | ||
+ | |||
+ | ==== Important Notes ==== | ||
+ | * **__Reservations will be deleted if a job is not submitted/ | ||
+ | * Reservations should be made in advance when possible | ||
+ | * Same day reservations requests will not cancel active jobs on the node(s) that provide the resources requested | ||
+ | * Nodes and their resources may be reserved for an initial **maximum of fourteen days** | ||
+ | * Extensions may be requested in increments of one week at most | ||
+ | |||
+ | ==== Requesting a reservation ==== | ||
+ | * Observe the avaialble resource tables above and determine which node(s) will meet your resource requirements | ||
+ | * Send a request to '' | ||
+ | - Resources needed | ||
+ | - Number of Nodes | ||
+ | - Number of CPUs (per node) | ||
+ | - Amount of RAM (per node) | ||
+ | - (optinal) Type and number of GPUs | ||
+ | - Reservation duration (maximum is two weeks) | ||
+ | - Users that should have access (userids) | ||
+ | |||
+ | ==== Using a reservation ==== | ||
+ | Include the reservation name in your slurm commands | ||
+ | |||
+ | Be sure to include the slurm QoS '' | ||
+ | |||
+ | Using '' | ||
+ | < | ||
+ | ~$ srun --reservation=< | ||
+ | ~$ salloc --reservation=< | ||
+ | </ | ||
+ | |||
+ | Using an '' | ||
+ | < | ||
+ | #SBATCH --reservation=< | ||
+ | #SBATCH --qos=csresnolim | ||
+ | </ | ||
+ | |||
+ | ==== Viewing Current Reservations ==== | ||
+ | To view existing reservations | ||
+ | < | ||
+ | ~$ scontrol show res | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== Submitting & Controlling Jobs ===== | ||
+ | ==== Submitting Jobs ==== | ||
+ | To submit a job to run on slurm nodes in the CS cluster, you must be logged into one of the login nodes. Currently, this is the **__[[ https:// | ||
+ | |||
+ | * **salloc** when run will allocate resources/ | ||
+ | * One purpose is parallel processing for allocating multiple nodes, then **srun** can execute a command across the allocated nodes | ||
+ | * **srun** can utilize a resource allocation from **salloc**, or can create one itself ([[https:// | ||
+ | * **sbatch** is used to submit a script that describes the resources required and program execution procedure ([[https:// | ||
+ | |||
+ | Sample command execution | ||
+ | < | ||
+ | // submit an sbatch script | ||
+ | ~$ sbatch mysbatchscript | ||
+ | </ | ||
+ | |||
+ | ==== Important Job Submission Notes ==== | ||
+ | * If you submit a job and it does not run immediately, | ||
+ | |||
+ | ==== Common Job Options ==== | ||
+ | Flags and parameters are passed directly to **salloc** and **srun** via the command line, or can be defined in an **sbatch** script by prefixing the flag with '' | ||
+ | |||
+ | This list contains several of the common flags to request resources when submitting a job. **//__When submitting a job, you should avoid specifying a hostname, and instead specify required resources__// | ||
+ | |||
+ | Note, the syntax of '' | ||
+ | < | ||
+ | -J or --job-name=< | ||
+ | -N <n> or --nodes=< | ||
+ | -n <n> or --ntasks=< | ||
+ | --ntasks-per-node=< | ||
+ | --ntasks-per-core=< | ||
+ | --ntasks-per-gpu=< | ||
+ | -p < | ||
+ | -c <n> or --cpus-per-task=< | ||
+ | primarily for multithreaded jobs, | ||
+ | default is one core per process | ||
+ | | ||
+ | --mem=< | ||
+ | (Ex. --mem=4000 requires 4000MBs of memory per node) | ||
+ | (Ex. --mem=4G requires 4GBs = 4096MBs of memory per node) | ||
+ | (Note, --mem=0 requires ALL memory to be available on | ||
+ | a node for your job to run. It is recommended to avoid | ||
+ | | ||
+ | idle (i.e. no other jobs running) to process your job) | ||
+ | |||
+ | --mem-per-cpu=< | ||
+ | specified in MBs | ||
+ | | ||
+ | --mem-per-gpu=< | ||
+ | specified in MBs | ||
+ | | ||
+ | -t D-HH:MM:SS or --time=D-HH: | ||
+ | should always be specified for interactive jobs. | ||
+ | Estimate high, adding at least six hours more than you | ||
+ | expect your program to run for. | ||
+ | Defaults to partition time limit default if not set. | ||
+ | Cannot exceed partition maximum WALL time. | ||
+ | |||
+ | --gres=< | ||
+ | (Ex. --gres=gpu: | ||
+ | (Ex. --gres=gpu: | ||
+ | | ||
+ | |||
+ | -C < | ||
+ | Logical operators are valid | ||
+ | OR: | | ||
+ | AND: & | ||
+ | | ||
+ | (Ex. --constraint=" | ||
+ | RTX 4000 GPU | ||
+ | |||
+ | (Ex. --constraint=" | ||
+ | a node that has an AMD Epyc processor AND | ||
+ | an A100 80GB GPU available) | ||
+ | |||
+ | (Ex. --constraint=" | ||
+ | a node that has an A100 80GB OR an A100 40GB GPU available) | ||
+ | |||
+ | --mail-type=< | ||
+ | Valid types are: none, | ||
+ | (Ex. --mail-type=begin, | ||
+ | | ||
+ | |||
+ | --mail-user=< | ||
+ | notifications | ||
+ | (all other domains such as ' | ||
+ | |||
+ | </ | ||
+ | |||
+ | * Note, '' | ||
+ | |||
+ | * Note, for memory specifications, | ||
+ | |||
+ | * Note, when allocating memory with '' | ||
+ | |||
+ | * A full list of **constraint options** can be found here (__[[https:// | ||
+ | |||
+ | |||
+ | ==== Environment Variables ==== | ||
+ | A full list of slurm environment variables can be found here **__[[https:// | ||
+ | |||
+ | To not carry environment variables from your shell forward | ||
+ | < | ||
+ | #SBATCH --export=NONE | ||
+ | </ | ||
+ | |||
+ | To export individual variables | ||
+ | < | ||
+ | #SBATCH --export=var0, | ||
+ | </ | ||
+ | |||
+ | Variables can be set and exported with | ||
+ | < | ||
+ | #SBATCH --export=var0=value0, | ||
+ | </ | ||
+ | |||
+ | |||
+ | ==== Log files: Standard Output (STDOUT) and Standard Error (STDERR) ==== | ||
+ | slurm by default aggregates STDOUT and STDERR streams into a single file, which by default will be named '' | ||
+ | |||
+ | See more information about **__[[https:// | ||
+ | |||
+ | Modify combined output file name | ||
+ | < | ||
+ | #SBATCH -o <output file name> | ||
+ | |||
+ | or | ||
+ | |||
+ | #SBATCH --output=< | ||
+ | </ | ||
+ | |||
+ | To separate the output files, define both file names for STDOUT and STDERR | ||
+ | < | ||
+ | #SBATCH --output=< | ||
+ | #SBATCH -e <error file name> | ||
+ | |||
+ | or | ||
+ | |||
+ | #SBATCH --output=< | ||
+ | #SBATCH --error=< | ||
+ | </ | ||
+ | |||
+ | A path can be specified for either option, patterns can be used in the file name as well | ||
+ | < | ||
+ | #SBATCH --output="/ | ||
+ | </ | ||
+ | |||
+ | ==== Interactive Job ==== | ||
+ | **Direct SSH connections are disabled** to slurm nodes. Instead, an interactive job can be initialized to run commands directly on a node. | ||
+ | |||
+ | Note, idle interactive sessions deny resource allocations for other jobs. **As such, idle interactive jobs when found are terminated.** Generally, interactive sessions should be used for testing and debugging with the intention of creating an SBATCH script to run your job. | ||
+ | |||
+ | Note, an interactive session will **time out after one hour** if no commands are executed during the hour. | ||
+ | |||
+ | The following example creates a resource allocation within the CPU partition for one node with two cores, 4GBs of memory, and a time limit of 30 minutes. Then, a BASH shell is initialized within the allocation: | ||
+ | < | ||
+ | userid@portal01~$ salloc -p cpu -c 2 --mem=4000 -J InteractiveJob -t 30 | ||
+ | salloc: Granted job allocation 12345 | ||
+ | |||
+ | userid@portal01~$ srun --pty bash -i -l -- | ||
+ | userid@node01~$ | ||
+ | |||
+ | ... testing performed ... | ||
+ | |||
+ | userid@node01~$ exit | ||
+ | userid@portal01~$ exit | ||
+ | salloc: Relinquishing job allocation 12345 | ||
+ | salloc: Job allocation 12345 has been revoked. | ||
+ | |||
+ | userid@portal01~$ | ||
+ | </ | ||
+ | |||
+ | |||
+ | **Be sure to type '' | ||
+ | |||
+ | ==== View Active Job Details ==== | ||
+ | To view all of your job(s) and their respective statuses | ||
+ | < | ||
+ | ~$ squeue -u < | ||
+ | </ | ||
+ | |||
+ | To view individual job details in full | ||
+ | < | ||
+ | ~$ scontrol show job < | ||
+ | </ | ||
+ | |||
+ | To view full job utilization and allocation details | ||
+ | < | ||
+ | ~$ sstat < | ||
+ | </ | ||
+ | |||
+ | A time to job start estimation can sometimes be gained by running | ||
+ | < | ||
+ | ~$ squeue --start -j < | ||
+ | </ | ||
+ | |||
+ | ==== Canceling Jobs ==== | ||
+ | To obtain jobids, be sure to utilize the '' | ||
+ | |||
+ | To cancel an individual job | ||
+ | < | ||
+ | ~$ scancel < | ||
+ | </ | ||
+ | |||
+ | To send a different signal to job processes (default is SIGTERM/ | ||
+ | < | ||
+ | ~$ scancel --signal=KILL < | ||
+ | </ | ||
+ | |||
+ | To cancel all of your jobs regardless of state | ||
+ | < | ||
+ | ~$ scancel -u < | ||
+ | </ | ||
+ | |||
+ | To cancel all of your PENDING (PD) jobs, include the '' | ||
+ | < | ||
+ | ~$ scancel -u < | ||
+ | </ | ||
+ | |||
+ | To completely restart a job, that is to cancel and restart from the beginning | ||
+ | < | ||
+ | ~$ scontrol requeue < | ||
+ | </ | ||
+ | |||
+ | ==== Email Notifications ==== | ||
+ | Email notifications are available for various job state changes, for example, when a job starts, completes, or fails. The scheduler checks **__every five minutes__** for emails to send out. | ||
+ | |||
+ | When receiving an email for completed jobs, the '' | ||
+ | |||
+ | To enable email notifications, | ||
+ | < | ||
+ | #SBATCH --mail-type=< | ||
+ | #SBATCH --mail-user=< | ||
+ | </ | ||
+ | |||
+ | Replace '' | ||
+ | |||
+ | Note, all other email domains such as **gmail.com** are silently ignored. | ||
+ | |||
+ | Note, for job arrays, the scheduler only generates a single email for '' | ||
+ | |||
+ | For example, to receive a notification when a job starts and finishes | ||
+ | < | ||
+ | #SBATCH --mail-type=begin, | ||
+ | #SBATCH --mail-user=< | ||
+ | </ | ||
+ | |||
+ | Official documentation for SLURM emailing functionality can be found **__([[https:// | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== Example SBATCH Scripts ===== | ||
+ | **sbatch** scripts are simple sequentially executed command scripts. These are simple **bash** scripts where slurm parameters ('' | ||
+ | |||
+ | When submitting an sbatch script, be aware of file paths. An sbatch script will use the current directory from where it was submitted. | ||
+ | |||
+ | To submit an sbatch script from a login node, replace '' | ||
+ | < | ||
+ | ~$ sbatch <script file name> | ||
+ | </ | ||
+ | |||
+ | The examples below are to give a starting point for creating a job script. It is recommended to modify as needed for your job(s). | ||
+ | |||
+ | === Single Process Program === | ||
+ | The following is an example of a single process job that runs on the CPU partition. This will allocate the default amount of memory for the cpu partition per core, which is only 256MBs | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH -n 1 | ||
+ | #SBATCH -t 04:00:00 | ||
+ | #SBATCH -p cpu | ||
+ | #SBATCH --mail-type=begin, | ||
+ | #SBATCH --mail-user=< | ||
+ | |||
+ | ./myprogram | ||
+ | </ | ||
+ | |||
+ | === Simple GPU Program (allocates first available GPU) === | ||
+ | The following allocates the first available GPU, regardless of model/type, along with 8GBs of CPU memory | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH --gres=gpu: | ||
+ | #SBATCH --mem=8000 | ||
+ | #SBATCH -n 1 | ||
+ | #SBATCH -t 04:00:00 | ||
+ | #SBATCH -p gpu | ||
+ | #SBATCH --mail-type=begin, | ||
+ | #SBATCH --mail-user=< | ||
+ | |||
+ | module purge | ||
+ | module load gcc | ||
+ | module load python | ||
+ | module load cuda | ||
+ | |||
+ | python3 myprogram.py | ||
+ | </ | ||
+ | |||
+ | === Simple GPU Program (allocates a specific GPU) === | ||
+ | The following requests a single A100 GPU card with 40GBs of memory when one is available, along with 16GBs of CPU memory | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH --gres=gpu: | ||
+ | #SBATCH --constraint=" | ||
+ | #SBATCH --mem=16000 | ||
+ | #SBATCH -n 1 | ||
+ | #SBATCH -t 04:00:00 | ||
+ | #SBATCH -p gpu | ||
+ | #SBATCH --mail-type=begin, | ||
+ | #SBATCH --mail-user=< | ||
+ | |||
+ | module purge | ||
+ | module load gcc | ||
+ | module load python | ||
+ | module load cuda | ||
+ | |||
+ | python3 myprogram | ||
+ | </ | ||
+ | |||
+ | Learn more about allocating GPUs for your job **__[[https:// | ||
+ | |||
+ | ==== Simple Parallel Program ==== | ||
+ | For jobs that will utilize the Message Passing Interface (MPI), several nodes/ | ||
+ | |||
+ | The following requests two servers, each to run 8 tasks, for a total of 16 tasks. | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH --nodes=2 | ||
+ | #SBATCH --ntasks-per-node=8 | ||
+ | #SBATCH -t 06:00:00 | ||
+ | #SBATCH -p cpu | ||
+ | #SBATCH --mail-type=begin, | ||
+ | #SBATCH --mail-user=< | ||
+ | |||
+ | module purge | ||
+ | module load gcc | ||
+ | module load openmpi | ||
+ | |||
+ | srun ./ | ||
+ | </ | ||
+ | |||
+ | ==== Job Arrays ==== | ||
+ | An option using slurm is to submit several jobs simultaneously for processing separate data. | ||
+ | |||
+ | The following requests a total of 16 array tasks, wherein each individually requires 2 CPUs and 1GB of memory. The resulting output files have the format '' | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH --job-name=myjobarray | ||
+ | #SBATCH --array=1-16 | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --mem=1000 | ||
+ | #SBATCH -t 00:04:00 | ||
+ | #SBATCH -p cpu | ||
+ | #SBATCH --output=examplearray_%A-%a.out | ||
+ | #SBATCH --mail-type=begin, | ||
+ | #SBATCH --mail-user=< | ||
+ | |||
+ | echo "Hello world from $HOSTNAME, slurm taskid is: $SLURM_ARRAY_TASK_ID" | ||
+ | </ | ||
+ | |||
+ | The following example uses dummy option input parameters from a file '' | ||
+ | < | ||
+ | --input_0=123 --normalize=True | ||
+ | --input_0=321 --normalize=False --seed=8 | ||
+ | </ | ||
+ | |||
+ | Then, the following SBATCH script can be used. '' | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | # | ||
+ | #SBATCH --job-name=myjobarray | ||
+ | #SBATCH --array=1-2 | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --mem=1000 | ||
+ | #SBATCH -t 00:02:00 | ||
+ | #SBATCH -p cpu | ||
+ | #SBATCH --output=examplearray_%A-%a.out | ||
+ | #SBATCH --mail-type=begin, | ||
+ | #SBATCH --mail-user=< | ||
+ | |||
+ | # set up modules and/or environment if needed | ||
+ | module purge | ||
+ | module load ... | ||
+ | |||
+ | OPTS=$(sed -n " | ||
+ | |||
+ | python my_python_script $OPTS | ||
+ | </ | ||
+ | |||
+ | For the first job, '' | ||
+ | |||
+ | ==== Job Ordering (Dependencies) ==== | ||
+ | Job dependencies allow for defining a condition which specifies when a job should run. Only one separator is allowed. If conditions are separated using a comma '' | ||
+ | |||
+ | For example, to specify that '' | ||
+ | |||
+ | To specify that job should start after two others have finished (regardless of status, use '' | ||
+ | |||
+ | For further reading, see **([[https:// | ||
+ | |||
+ | |||
+ | === Canceling Array Tasks === | ||
+ | To cancel one single task from an array | ||
+ | < | ||
+ | ~$ scancel < | ||
+ | </ | ||
+ | |||
+ | To cancel a range of tasks | ||
+ | < | ||
+ | ~$ scancel < | ||
+ | </ | ||
+ | |||
+ | A list format can also be provided | ||
+ | < | ||
+ | ~$ scancel < | ||
+ | </ | ||
+ | |||
+ | ---- | ||