Skip to main content

GPUs on MONTAGE

GPUs are split into Multi-Instance GPUs (MIGs) to enable more users to access GPU resources; of course, the performance is also split too. However, the A100 GPUs in nodes mum-hpc2-gpu[5-6] do not have MIG enabled, reserved for users with urgent needs. These are only accessible under the Slurm highprio partition.

Our cluster provides a mix of Shared (MIG) and Dedicated (Full) GPU resources.

1. Mixed-Use Nodes: mum-hpc2-gpu[1-4]

These nodes are configured to support both small interactive tasks and medium-sized batch jobs. Each node contains 4 physical A100s, but they are split into 6 specific logical slices:

Slice NameGRES TypeAvailable VRAMQuantity per NodeTotal VRAMSlurm example
Small Slice3g.39gb40 GB4160 GB--gres=3g.39gb:1
Full Slice7g.79gb80 GB2160 GB--gres=7g.79gb:1
Node Total6 Slices320 GB

Note: On these nodes, you request a slice, not a physical GPU number.


2. Full-Power Nodes: mum-hpc2-gpu[5-6]

These nodes are strictly for high-performance workloads requiring maximum memory and NVLink interconnectivity. You will have to request for highprio

Resource TypeGRES TypeAvailable VRAMQuantity per NodeTotal VRAMSlurm example
Full (No MIG)a10080 GB4320 GB--gres=a100:1

How to use GPU in my jobs?

Accessing GPUs may be slightly tricky. Of course, the partition you need to access would be the gpu partition. If you have a slurm script file, just add this SLURM config line at the top of your script.

Things to specify in your script:

  • --partition=[PARTITION],
    • [PARTITION] would either gpu or highprio
  • --gres=[GRES]:[NO],
    • [GRES] should be the GRES type.
    • [NO] refers to number of MIGs/physical GPUs required.

For example:

#!/usr/bin/env bash

#SBATCH --partition=gpu
#SBATCH --gres=3g.39gb:1

[... the rest of your code ...]