Rorqual

Rorqual is a heterogeneous and versatile cluster designed for a wide variety of scientific computations. Built by Dell Canada and CDW Canada, Rorqual is located at the École de technologie supérieure. Its name recalls the rorqual, a marine mammal whose several species, for example the minke whale and the blue whale, have been observed in the waters of the St. Lawrence River.

Access¶

To access the compute cluster, each researcher must complete an access request in the form found via Resources > System Access in the CCDB menu bar. In this form:

Select 'Rorqual' from the left list.
In the first box on the right, select the access request.
Then accept all specific agreements with Calcul Québec:
- Consent for the Collection and Use of Personal Information,
- Rorqual Service Level Agreement,
- Terms of Use.

Warning

Actual access to the cluster may take up to one hour after completing the access request.

Specifics¶

Warning

Our policy is that Rorqual compute nodes do not have internet access. To request an exception, please contact technical support explaining your needs and reasons. Note that the crontab tool is not available.

Each job should be at least one hour long (at least five minutes for test jobs), and you cannot have more than 1000 jobs (running and pending) at a time. The maximum job duration is 7 days (168 hours).

Storage¶

Filesystem	Description
HOME Lustre filesystem, 116 TB total space	* This space is small and cannot be expanded: you will need to use your `project` space for large storage needs. Small, fixed quotas per user There is an automatic daily backup.
SCRATCH Lustre filesystem, 6.5 PB total space	* Accessible via the symbolic link `$HOME/links/scratch` Large space for storing temporary files during computations. No automatic backup system Large, fixed quotas per user There is an automatic purging policy for old files in this space.
PROJECT Lustre filesystem, 62 PB total space	* Accessible via the symbolic link `$HOME/links/projects/project-name` This space is designed for data sharing among group members and for storing large amounts of data. Large, adjustable quotas per project * There is an automatic daily backup.

At the very beginning of this page, a table lists several connection addresses. For data transfers via Globus, use the Globus Endpoint. However, for tools like rsync and scp, you must use the Data transfer node address.

High-Performance Networking¶

InfiniBand Networking
- HDR 200Gbit/s
- Maximum blocking factor: 34:6 or 5.667:1
- Size of CPU node islands: up to 31 nodes of 192 cores that can communicate without blocking.

Node Specifications¶

nodes	cores	available memory	storage	CPU	GPU
670	192	750G or 768000M	1 x 480GB SATA SSD (6Gbit/s)	2 x AMD EPYC 9654 (Zen 4) @ 2.40 GHz, 384MB L3 cache
8	192	750G or 768000M	1 x 3.84TB NVMe SSD	2 x AMD EPYC 9654 (Zen 4) @ 2.40 GHz, 384MB L3 cache
8	192	3013G or 3086250M	1 x 480GB SATA SSD (6Gbit/s)	2 x AMD EPYC 9654 (Zen 4) @ 2.40 GHz, 384MB L3 cache
93	64	498G or 510000M	1 x 3.84TB NVMe SSD	2 x Intel Xeon Gold 6448Y @ 2.10 GHz, 60MB L3 cache	4 x NVidia H100 SXM5 (80GB memory), connected via NVLink

To get a larger $SLURM_TMPDIR space, you must request --tmp=xG, where x is a value between 370 and 3360.

CPU Node Topology¶

In a CPU node, the 192 cores and different memory spaces are not equidistant, causing variable latencies (on the order of nanoseconds) to access data. In each node, we have:

Two (2) CPU sockets, each with 12 system memory channels.
- Four (4) NUMA nodes per CPU socket, each connected to three (3) system memory channels.
  - Three (3) chiplets per NUMA node, each with its own 32 MiB L3 cache.
    - Eight (8) cores per chiplet, each with its own 1 MiB L2 and 32+32 KiB L1 cache.

In other words, we have: * Groups of 8 closely located cores that share the same L3 cache, which is ideal for multithreaded parallel programs (for example, with the --cpus-per-task=8 option). * NUMA nodes of 3×8 = 24 cores that share a trio of system memory channels. * A total of 2×4×3×8 = 192 cores per node.

To fully benefit from this topology, you must reserve full nodes (for example, with --ntasks-per-node=24 --cpus-per-task=8) and explicitly control the placement of processes and execution threads. Depending on the parallel program and the number of cores used, the gains can be marginal or significant.

GPU Node Topology¶

In GPU nodes, the architecture is less hierarchical. We have:

Two (2) CPU sockets. For each, we have:
- Eight (8) system memory channels
- 60 MiB L3 cache
- 32 equidistant cores, each with its own 2 MiB L2 and 32+48 KiB L1 cache.
- Two (2) NVidia H100 accelerators

In total, the four (4) accelerators on the node are interconnected by SXM5.

GPU Instances¶

The different GPU instance names available on Rorqual are:

Type	Instance	Short Name	Unitless Name	By Memory	Full Name
GPU	H100-80gb	`h100`	`h100`	`h100_80gb`	`nvidia_h100_80gb_hbm3`
MIG	H100-1g.10gb	`h100_1g.10gb`	`h100_1.10`	`h100_10gb`	`nvidia_h100_80gb_hbm3_1g.10gb`
MIG	H100-2g.20gb	`h100_2g.20gb`	`h100_2.20`	`h100_20gb`	`nvidia_h100_80gb_hbm3_2g.20gb`
MIG	H100-3g.40gb	`h100_3g.40gb`	`h100_3.40`	`h100_40gb`	`nvidia_h100_80gb_hbm3_3g.40gb`

To request one or more full H100 GPUs, use one of the following Slurm options: * One H100-80gb: --gpus=h100:1 or --gpus=h100_80gb:1 * Multiple H100-80gb per node: * --gpus-per-node=h100:2 * --gpus-per-node=h100:3 * --gpus-per-node=h100:4 * Multiple H100-80gb scattered anywhere: --gpus=h100:n (replace n with the desired number)

Approximately half of Rorqual's GPU nodes are configured with MIG technology, and only three GPU instance sizes are available:

H100-1g.10gb: 1/8 of the compute power with 10 GB of GPU memory.
H100-2g.20gb: 2/8 of the compute power with 20 GB of GPU memory.
H100-3g.40gb: 3/8 of the compute power with 40 GB of GPU memory.

To request one and only one GPU instance for your compute job, here are the corresponding options:

H100-1g.10gb: --gpus=h100_1g.10gb:1
H100-2g.20gb: --gpus=h100_2g.20gb:1
H100-3g.40gb: --gpus=h100_3g.40gb:1

The maximum recommended amounts of CPU cores and system memory per GPU instance are listed in the bundle characteristics table.