Slurm Help

Slurm Documentation

The HPC Team cannot possibly keep up with the constant slew of updates to Slurm and it’s documentation. All general information about commands, operation, and use of Slurm can be found from the list of relevant pages below. Remember, always defer to SchedMD’s website first for the most up to date and relevant documentation.

Official Documentation

Non-official Documentation

DEAC Cluster Slurm Specifics

The DEAC Cluster has a few configuration specifics that make it unique from a defacto Slurm install. They are listed below.

Example

Below is an example job that can run on the DEAC Cluster. Normally, an --account= directive entry exists, but in this examples case, the default account will be used. It is highly recommended to include an account specification, especially for users who belong to multiple research groups.

#!/bin/bash
#SBATCH --job-name="Example_Submission"
#SBATCH --partition=small
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=5GB
#SBATCH --time=00-00:05:00
#SBATCH --mail-user=%u@wfu.edu
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --output=slurm-%x-%j.o

echo "Running Job $SLURM_JOB_ID"  #Print Slurm Job ID

pwd                               #Print current working directory
cd /home/$USER                    #Go to homedir and print so you see change
pwd
cd /scratch/$SLURM_JOB_ID         #Change to temp scratch dir
pwd

which python3                     #Show default python3 path
python3 -V                        #Show default python3 version

module load apps/python/3.11.8    #Load python3 modulefile
which python3                     #Show updated python3 path
python3 -V                        #Show updated python3 version
module list                       #Show loaded modules

hostname                          #Print compute node hostname where job ran

Accounts

Each research group corresponds to a shared Slurm account for tracking utilization. IE, Engineering Professor Adam Carlson would have a “carlsonGrp” Slurm account for which he and all of his sponsored researchers would utilize when submitting jobs to Slurm. The account is specified using the Slurm_accounts directive (--accounts=) in a batch job submission.

Each Slurm Account inherits it’s priority from the parent department. So in this case, carlsonGrp would inherit their priority from the “egr” Slurm parent account. This is important to know because all Slurm child accounts to egr affect the overall priority for each other. Same goes for all corresponding departments

Partitions

The DEAC Cluster has 4 primary partitions:

  • large - Jobs > 1 node, <180 days; the default partition.

  • small - Jobs = 1 node, <1 day; receives double partition priority as large.

  • gpu - Jobs <= 2 nodes, <28 days; only partition with GPU resources.

  • interactive - Jobs = 1 node, <1 day; all interactive jobs run here.

The small, large, and interactive partitions share the same nodes. The only difference is the limits set by running jobs, and the priority assigned to each job upon submission. The GPU partition is comprised of GPU nodes, which can also be found in the interactive partition.

Node Features

Because the DEAC Cluster is heterogeneous, we use node Features to identify differences between node types. Features can be referenced using the Slurm_constraints directive (--constraints=) in a batch job submission. Valid features and constraint options are as follows:

  • login: These nodes are used to submit jobs and are not assigned to any partition to execute jobs.

  • amd : These nodes contain amd cores (64-core)

  • zen# : This designates the revision of amd core architecture (the higher the number, the newer the architecture).

  • intel : These nodes contain intel cores

  • skylake : These nodes have Intel’s Xeon E5 Skylake based processors (44-core UCS nodes)

  • cascade : These nodes have Intel’s Xeon Gold Cascade Lake based processors (44 and 48-core UCS nodes)

  • rocky9 : Designates the operating system installed on the node.

  • 44cores : Designates 44-cores available on the node.

  • 48cores : Designates 48-cores available on the node.

  • 64cores : Designates 64-cores available on the node.

  • highmem : Designates high memory limit (currently 2.3TB) on the node

  • gpu : Designates GPU available (suboption is: a100_80, a100_40, v100_32).

Priority Calculation

The Priority Calculation equation used by the DEAC Cluster for each job is as follows:

\[\begin{split}Priority_{\mathrm{Job}} = & ( PriorityWeight_{\mathrm{Fairshare}} * 1000 ) + \\ & ( PriorityWeight_{\mathrm{Age}} * 3000 ) + \\ & ( PriorityWeight_{\mathrm{Partition}} * 500 ) + \\ & ( PriorityWeight_{\mathrm{QOS}} * 3000 ) - Factor_{\mathrm{Nice}}\end{split}\]

The following Priority Weights are determined as follows:

  • Fairshare = Based upon a leveled Department Fairshare (\(\mathbf{F_{\mathrm{Dept}}}\)) starting value, and adjusted by Slurm based on monthly utilization compared to expected baseline.

  • Age = Slurm assigned value based on wait time (up to 7 day max; up to 100 jobs per group simultaneously)

  • Partition = DEAC partition values as follows: small=20; large=10; gpu=40; (all all others=10)

  • QOS = 0 for normal QOS (default), and 10 for any high QOS (only available for contributors).

  • Nice_Factor = A way to manually adjust job importance by weight of +/-2147483645 (via –nice directive). A positive value lowers priority; only admins can assign a negative value to increase priority.

The higher the overall calculated value, the higher the priority. The most complicated aspect of this calculation is called “leveled fairshare”, where Slurm takes the standard assigned integer value and levels it on a scale of 0 to 1. In the following example, we’ll use a new user example (leveld fairshare of 1). If a user submits a job via their normal QOS to the large partition, the calculation is as follows:

\[\begin{split}Priority_{\mathrm{Job}} = & ( 1 * 1000 ) + \\ & ( 0 * 3000 ) + \\ & ( 10 * 500 ) + \\ & ( 0 * 3000 ) - 0 \\ = & 1500\end{split}\]

If the user has made a contribution, and submits a job via their high QOS to the large partition, the calculation is as follows:

\[\begin{split}Priority_{\mathrm{Job}} = & ( 1 * 1000 ) + \\ & ( 0 * 3000 ) + \\ & ( 10 * 500 ) + \\ & ( 10 * 3000 ) - 0 \\ = & 4500\end{split}\]

This highlights how a contributing group receives a three times increase in priority via their high QOS from the same starting point for a job submission.

If a non-contributing user has waited 7 days for their job to start (the maximum time factor), then their fairshare will have increased to the same priority as the high QOS:

\[\begin{split}Priority_{\mathrm{Job}} = & ( 1 * 1000 ) + \\ & ( 1 * 3000 ) + \\ & ( 10 * 500 ) + \\ & ( 0 * 3000 ) - 0 \\ = & 4500\end{split}\]

This time-based increase helps ensure a level of balance so that non-contributing users can still have jobs run after a certain amount of wait time.