GPUs on MONTAGE

GPUs are split into Multi-Instance GPUs (MIGs) to enable more users to access GPU resources; of course, the performance is also split too. However, the A100 GPUs in nodes mum-hpc2-gpu[5-6] do not have MIG enabled, reserved for users with urgent needs. These are only accessible under the Slurm highprio partition.

Our cluster provides a mix of Shared (MIG) and Dedicated (Full) GPU resources.

1. Mixed-Use Nodes: `mum-hpc2-gpu[1-4]`

These nodes are configured to support both small interactive tasks and medium-sized batch jobs. Each node contains 4 physical A100s, but they are split into 6 specific logical slices:

Slice Name	GRES Type	Available VRAM	Quantity per Node	Total VRAM	Slurm example
Small Slice	`3g.39gb`	40 GB	4	160 GB	`--gres=3g.39gb:1`
Full Slice	`7g.79gb`	80 GB	2	160 GB	`--gres=7g.79gb:1`
Node Total	—	—	6 Slices	320 GB

Note: On these nodes, you request a slice, not a physical GPU number.

2. Full-Power Nodes: `mum-hpc2-gpu[5-6]`

These nodes are strictly for high-performance workloads requiring maximum memory and NVLink interconnectivity. You will have to request for highprio

Resource Type	GRES Type	Available VRAM	Quantity per Node	Total VRAM	Slurm example
Full (No MIG)	`a100`	80 GB	4	320 GB	`--gres=a100:1`

How to use GPU in my jobs?

Accessing GPUs may be slightly tricky. Of course, the partition you need to access would be the gpu partition. If you have a slurm script file, just add this SLURM config line at the top of your script.

Things to specify in your script:

--partition=[PARTITION],
- [PARTITION] would either gpu or highprio
--gres=[GRES]:[NO],
- [GRES] should be the GRES type.
- [NO] refers to number of MIGs/physical GPUs required.

For example:

#!/usr/bin/env bash

#SBATCH --partition=gpu
#SBATCH --gres=3g.39gb:1

[... the rest of your code ...]

1. Mixed-Use Nodes: mum-hpc2-gpu[1-4]​

2. Full-Power Nodes: mum-hpc2-gpu[5-6]​

How to use GPU in my jobs?​

1. Mixed-Use Nodes: `mum-hpc2-gpu[1-4]`

2. Full-Power Nodes: `mum-hpc2-gpu[5-6]`

How to use GPU in my jobs?