Skip to content

Rorqual

| Availability | June 19, 2025 | Login node | rorqual.alliancecan.ca | Data transfer node (rsync, scp, sftp ...) | rorqual.alliancecan.ca | Automation node | robot.rorqual.alliancecan.ca | Globus Collection | alliancecan#rorqual | JupyterHub | jupyterhub.rorqual.alliancecan.ca | Portal | metrix.rorqual.alliancecan.ca | Webinar | slides, video

Rorqual is a heterogeneous and versatile cluster designed for a wide variety of scientific computations. Built by Dell Canada and CDW Canada, Rorqual is located at the École de technologie supérieure. Its name recalls the rorqual, a marine mammal whose several species, for example the minke whale and the blue whale, have been observed in the waters of the St. Lawrence River.

Access

To access the compute cluster, each researcher must complete an access request in the form found via Resources > System Access in the CCDB menu bar. In this form:

  1. Select 'Rorqual' from the left list.
  2. In the first box on the right, select the access request.
  3. Then accept all specific agreements with Calcul Québec:
    • Consent for the Collection and Use of Personal Information,
    • Rorqual Service Level Agreement,
    • Terms of Use.

Warning

Actual access to the cluster may take up to one hour after completing the access request.

Specifics

Warning

Our policy is that Rorqual compute nodes do not have internet access. To request an exception, please contact technical support explaining your needs and reasons. Note that the crontab tool is not available.

Each job should be at least one hour long (at least five minutes for test jobs), and you cannot have more than 1000 jobs (running and pending) at a time. The maximum job duration is 7 days (168 hours).

Storage

Filesystem Description
HOME
Lustre filesystem, 116 TB total space
* This space is small and cannot be expanded: you will need to use your project space for large storage needs.
Small, fixed quotas per user
There is an automatic daily backup.
SCRATCH
Lustre filesystem, 6.5 PB total space
* Accessible via the symbolic link $HOME/links/scratch
Large space for storing temporary files during computations.
No automatic backup system
Large, fixed quotas per user
There is an automatic purging policy for old files in this space.
PROJECT
Lustre filesystem, 62 PB total space
* Accessible via the symbolic link $HOME/links/projects/project-name
This space is designed for data sharing among group members and for storing large amounts of data.
Large, adjustable quotas per project
* There is an automatic daily backup.

At the very beginning of this page, a table lists several connection addresses. For data transfers via Globus, use the Globus Endpoint. However, for tools like rsync and scp, you must use the Data transfer node address.

High-Performance Networking

  • InfiniBand Networking
    • HDR 200Gbit/s
    • Maximum blocking factor: 34:6 or 5.667:1
    • Size of CPU node islands: up to 31 nodes of 192 cores that can communicate without blocking.

Node Specifications

nodes cores available memory storage CPU GPU
670 192 750G or 768000M 1 x 480GB SATA SSD (6Gbit/s) 2 x AMD EPYC 9654 (Zen 4) @ 2.40 GHz, 384MB L3 cache
8 192 750G or 768000M 1 x 3.84TB NVMe SSD 2 x AMD EPYC 9654 (Zen 4) @ 2.40 GHz, 384MB L3 cache
8 192 3013G or 3086250M 1 x 480GB SATA SSD (6Gbit/s) 2 x AMD EPYC 9654 (Zen 4) @ 2.40 GHz, 384MB L3 cache
93 64 498G or 510000M 1 x 3.84TB NVMe SSD 2 x Intel Xeon Gold 6448Y @ 2.10 GHz, 60MB L3 cache 4 x NVidia H100 SXM5 (80GB memory), connected via NVLink
  • To get a larger $SLURM_TMPDIR space, you must request --tmp=xG, where x is a value between 370 and 3360.

CPU Node Topology

In a CPU node, the 192 cores and different memory spaces are not equidistant, causing variable latencies (on the order of nanoseconds) to access data. In each node, we have:

  • Two (2) CPU sockets, each with 12 system memory channels.
    • Four (4) NUMA nodes per CPU socket, each connected to three (3) system memory channels.
      • Three (3) chiplets per NUMA node, each with its own 32 MiB L3 cache.
        • Eight (8) cores per chiplet, each with its own 1 MiB L2 and 32+32 KiB L1 cache.

In other words, we have: * Groups of 8 closely located cores that share the same L3 cache, which is ideal for multithreaded parallel programs (for example, with the --cpus-per-task=8 option). * NUMA nodes of 3×8 = 24 cores that share a trio of system memory channels. * A total of 2×4×3×8 = 192 cores per node.

To fully benefit from this topology, you must reserve full nodes (for example, with --ntasks-per-node=24 --cpus-per-task=8) and explicitly control the placement of processes and execution threads. Depending on the parallel program and the number of cores used, the gains can be marginal or significant.

GPU Node Topology

In GPU nodes, the architecture is less hierarchical. We have:

  • Two (2) CPU sockets. For each, we have:
    • Eight (8) system memory channels
    • 60 MiB L3 cache
    • 32 equidistant cores, each with its own 2 MiB L2 and 32+48 KiB L1 cache.
    • Two (2) NVidia H100 accelerators

In total, the four (4) accelerators on the node are interconnected by SXM5.

GPU Instances

The different GPU instance names available on Rorqual are:

Type Instance Short Name Unitless Name By Memory Full Name
GPU H100-80gb h100 h100 h100_80gb nvidia_h100_80gb_hbm3
MIG H100-1g.10gb h100_1g.10gb h100_1.10 h100_10gb nvidia_h100_80gb_hbm3_1g.10gb
MIG H100-2g.20gb h100_2g.20gb h100_2.20 h100_20gb nvidia_h100_80gb_hbm3_2g.20gb
MIG H100-3g.40gb h100_3g.40gb h100_3.40 h100_40gb nvidia_h100_80gb_hbm3_3g.40gb

To request one or more full H100 GPUs, use one of the following Slurm options: * One H100-80gb: --gpus=h100:1 or --gpus=h100_80gb:1 * Multiple H100-80gb per node: * --gpus-per-node=h100:2 * --gpus-per-node=h100:3 * --gpus-per-node=h100:4 * Multiple H100-80gb scattered anywhere: --gpus=h100:n (replace n with the desired number)

Approximately half of Rorqual's GPU nodes are configured with MIG technology, and only three GPU instance sizes are available:

  • H100-1g.10gb: 1/8 of the compute power with 10 GB of GPU memory.
  • H100-2g.20gb: 2/8 of the compute power with 20 GB of GPU memory.
  • H100-3g.40gb: 3/8 of the compute power with 40 GB of GPU memory.

To request one and only one GPU instance for your compute job, here are the corresponding options:

  • H100-1g.10gb: --gpus=h100_1g.10gb:1
  • H100-2g.20gb: --gpus=h100_2g.20gb:1
  • H100-3g.40gb: --gpus=h100_3g.40gb:1

The maximum recommended amounts of CPU cores and system memory per GPU instance are listed in the bundle characteristics table.