Narval


Availability:	since October 2021
Login node:	narval.alliancecan.ca
Globus Collection:	Compute Canada - Narval
Copy node (rsync, scp, sftp,...):	narval.alliancecan.ca
Portal:	https://portail.narval.calculquebec.ca/

Narval is a heterogeneous and versatile cluster designed for a wide variety of small to medium-sized scientific computations. Built by Dell Canada and CDW Canada, Narval is located at the École de technologie supérieure. Its name refers to the narwhal, a marine mammal sometimes observed in the waters of the St. Lawrence River.

Particularities¶

Internet Access and crontab

Our policy dictates that Narval compute nodes do not have access to the internet. To request an exception, please contact technical support explaining what you need and why. Note that the crontab tool is not available.

Each job should be at least one hour long (at least five minutes for test jobs) and you cannot have more than 1000 jobs (running and pending) at a time. The maximum job duration is 7 days (168 hours).

Storage¶


HOME Lustre filesystem, 64 TB total space	* This space is small and cannot be expanded; you will need to use your `project` space for large storage needs. * Small, fixed quotas per user. * Automatic daily backups are performed.
SCRATCH Lustre filesystem, 5.7 PB total space	* Large space for storing temporary files during computations. * No automatic backup system. * Large, fixed quotas per user. * There is an automatic purging of old files in this space.
PROJECT Lustre filesystem, 35 PB total space	* This space is designed for data sharing among group members and for storing large amounts of data. * Large, adjustable quotas per project. * Automatic daily backups are performed.

At the very beginning of this page, a table indicates several connection addresses. For data transfers via Globus, you must use the Globus Drop-off Point. However, for tools like rsync and scp, you must use the Copy Node address.

High-Performance Networking¶

The Mellanox HDR InfiniBand network connects all nodes in the cluster. Each 40-port HDR switch (200 Gb/s) can connect up to 66 nodes in HDR100 (100 Gb/s) with 33 HDR links split into two (2) using special cables. The remaining seven (7) HDR links are used to connect a cabinet's switch to each of the seven (7) central InfiniBand HDR switches. Node islands are thus connected with a maximum blocking factor of 33:7 (4.7:1). However, storage servers are connected with a significantly lower blocking factor for maximum performance.

In practice, Narval cabinets contain islands of 48 or 56 regular CPU nodes. It is therefore possible to run parallel jobs using up to 3584 cores and non-blocking networking. For larger or more network-fragmented jobs, the blocking factor is 4.7:1. Despite this, the interconnection remains high-performance.

Node Characteristics¶

Nodes	Cores	Available Memory	CPU	Storage	GPU
1145	64	250G or 256000M	2 x AMD EPYC 7532 (Zen 2) @ 2.40 GHz, 256M L3 cache	1 x 960G SSD
33	64	2009G or 2057500M	2 x AMD EPYC 7532 (Zen 2) @ 2.40 GHz, 256M L3 cache	1 x 960G SSD
3	64	4000G or 4096000M	2 x AMD EPYC 7502 (Zen 2) @ 2.50 GHz, 128M L3 cache	1 x 960G SSD
159	48	498G or 510000M	2 x AMD EPYC 7413 (Zen 3) @ 2.65 GHz, 128M L3 cache	1 x 3.84T SSD	4 x NVidia A100SXM4 (40G memory), connected via NVLink

AMD Processor Specifics¶

Supported Instruction Sets¶

The Narval cluster is equipped with 2nd and 3rd generation AMD EPYC processors that support AVX2 instructions.

However, Narval does not support AVX512 instructions, unlike newer cluster nodes.

Intel Compilers¶

Intel compilers can effectively compile applications for Narval's AMD processors by limiting them to AVX2 and older instruction sets. To do this, you must use the -march=core-avx2 option with the Intel compiler, which produces executables that are compatible with both Intel and AMD processors.

However, if you compiled code on a system using Intel processors and used one or more -xXXXX options, such as -xCORE-AVX2, the compiled applications will not run on Narval, because Intel compilers add additional instructions to verify that the processor used is an Intel product. On Narval, the -xHOST and -march=native options are equivalent to -march=pentium (the old Pentium from 1993) and should not be used.

Available Software Environments¶

The standard software environment StdEnv/2023 is the default environment on Narval. Older versions (2016 and 2018) have been deliberately blocked. If you need software that is only available on an older version of the standard environment, we invite you to send a request to our technical support.

BLAS and LAPACK Libraries¶

The Intel MKL library works on AMD processors, but it is not optimal. We now favour the use of FlexiBLAS. For more details, consult the BLAS and LAPACK page.

GPU Instances¶

To request one or more full A100 GPUs, use one of the following Slurm options:

One A100-40GB:
```
--gpus=a100:1
```

Multiple A100-40GB per node:

--gpus-per-node=a100:2
--gpus-per-node=a100:3
--gpus-per-node=a100:4

Multiple A100-40GB scattered anywhere:
```
--gpus=a100:n
```
(replace n with the desired number)

Several Narval GPU nodes are configured with MIG technology and three GPU instance sizes are available:

1g.5gb: 1/8 of the compute power with 5 GB of GPU memory.
2g.10gb: 2/8 of the compute power with 10 GB of GPU memory.
3g.20gb: 3/8 of the compute power with 20 GB of GPU memory.

To request one and only one GPU instance for your compute job, here are the corresponding options:

1g.5gb:
```
--gpus=a100_1g.5gb:1
```
2g.10gb:
```
--gpus=a100_2g.10gb:1
```
3g.20gb:
```
--gpus=a100_3g.20gb:1
```

The maximum recommended quantities of CPU cores and system memory per GPU instance are listed in the bundles characteristics table.

Monitoring Your Jobs¶

From the portal, you can monitor your CPU and GPU compute jobs in real-time or past jobs to maximize resource utilization and decrease your wait times in the queue.

For a given job, you will be able to visualize: * compute core usage; * memory usage; * GPU usage.

It is important to use the allocated resources and adjust your requests when compute resources are underused or not used at all. For example, if you request four cores (CPU) but only use one, you must adjust your submission file accordingly.