Béluga

Attention

Béluga has been replaced by a new cluster called Rorqual. To enable the full production rollout of this new cluster, we had to shut down all Béluga compute nodes. The login nodes and storage system will remain accessible. To follow the progressive shutdown steps for Béluga, see this incident page and the Infrastructure Renewal page.

Availability	March 2019
Login Node	beluga.alliancecan.ca
Globus Collection	computecanada#beluga-dtn
Transfer Node (rsync, scp, sftp,...)	beluga.alliancecan.ca
Portal	https://portail.beluga.calculquebec.ca/

Béluga is a heterogeneous, general-purpose cluster designed for ordinary computations; it is located at the École de technologie supérieure. Its name refers to the beluga whale, a marine mammal living in the waters of the St. Lawrence River.

Specifics¶

Our policy dictates that Béluga compute nodes do not have internet access. To request an exception, contact technical support explaining your needs and reasons. Note that the crontab tool is not available.

Each job should have a minimum duration of one hour (or at least five minutes for test jobs), and a user cannot have more than 1000 jobs (running and pending) at a time. The maximum job duration is 7 days (168 hours).

Storage¶

Filesystem	Description
HOME Lustre Filesystem, 105 TB total space	* This space is small and cannot be expanded; you will need to use your `project` space for large storage needs. Small fixed per-user quotas. Automatic daily backup.
SCRATCH Lustre Filesystem, 2.6 PB total space	* Large space for storing temporary files during computations. No automatic backup system. Large fixed per-user quotas. * Automatic purging of old files in this space.
PROJECT Lustre Filesystem, 25 PB total space	* This space is designed for sharing data among group members and for storing large amounts of data. Large adjustable per-project quotas. Automatic daily backup.

For data transfers via Globus, the computecanada#beluga-dtn endpoint should be used, while for tools like rsync and scp, a login node can be used.

High-Performance Networking¶

The Mellanox Infiniband EDR (100 Gb/s) network connects all cluster nodes. A central 324-port switch aggregates island connections with a maximum blocking factor of 5:1. Storage servers are connected with non-blocking interconnects. The architecture enables multiple parallel tasks with up to 640 cores (or more) thanks to non-blocking networking. For larger tasks, the blocking factor is 5:1; even for tasks running across multiple islands, the interconnectivity is high-performance.

Node Specifications¶

Note: as of July 31, 2025, all nodes below are powered off. Turbo mode is now enabled on all Béluga nodes.

Nodes	Cores	Available Memory	CPU	Storage	GPU
160	40	92G or 95000M	2 x Intel Gold 6148 Skylake @ 2.4 GHz	1 x 480G SSD	-
579	40	186G or 191000M	2 x Intel Gold 6148 Skylake @ 2.4 GHz	1 x 480G SSD	-
10	40	186G or 191000M	2 x Intel Gold 6148 Skylake @ 2.4 GHz	6 x 480G SSD	-
51	40	752G or 771000M	2 x Intel Gold 6148 Skylake @ 2.4 GHz	1 x 480G SSD	-
2	40	752G or 771000M	2 x Intel Gold 6148 Skylake @ 2.4 GHz	6 x 480G SSD	-
172	40	186G or 191000M	2 x Intel Gold 6148 Skylake @ 2.4 GHz	1 x 1.6T NVMe SSD	4 x NVidia V100SXM2 (16G memory), connected via NVLink

To obtain a larger $SLURM_TMPDIR space, you must request --tmp=xG, where x is a value between 350 and 2490.

Monitoring Your Jobs¶

From the portal, you can monitor your CPU and GPU compute jobs in "real-time" or past jobs to maximize resource utilization and reduce your queue wait times.

For a given job, you can visualize: * compute core utilization; * memory usage; * GPU utilization;

It is important to use the allocated resources and to adjust your requests when compute resources are underutilized or not used at all. For example, if you request four cores (CPUs) but only use one, you must adjust your submission file accordingly.