Anaconda
Anaconda is a Python distribution.
Before using Anaconda
We understand that Anaconda is widely used in several fields studied by our users (data science, AI, bioinformatics, etc.). Anaconda is an interesting solution for simplifying Python and library management on a personal computer. However, on a cluster like those maintained by the Alliance, library management must be handled by our staff to ensure maximum compatibility and performance. Furthermore, using Anaconda on a compute cluster can lead to several problems. Before using Anaconda, we ask you to contact our technical support so our experts can explore alternatives with you. If you choose to use Anaconda, please note that our team will not be able to provide support if you encounter issues.
Why Anaconda is Not Recommended on a Compute Cluster¶
Anaconda can be problematic on a compute cluster for several reasons:
- Anaconda very often installs software (compilers, scientific libraries, etc.) that already exist on the Alliance clusters as modules, with sub-optimal configurations that can cause conflicts.
- Installs binaries that are not optimized for our clusters' processors. Your computations could therefore be slower.
- Makes incorrect assumptions about library locations. You might encounter runtime errors.
- Installs into your default
$HOMEdirectory, where it places an enormous number of files. A standalone Anaconda installation can take up nearly half of your file count quota in your home directory. - Is slower for installing packages.
- Modifies
$HOME/.bashrc, which can cause conflicts.
What Are the Alternatives?¶
The first step you should take is to contact our technical support so our experts can explore the best alternative for your needs with you. If you prefer to try on your own, two main options are listed below.
Transitioning from Conda to virtualenv¶
Virtualenv offers all the features you need to use Python on our clusters. This should be your first choice to explore. Here's how to switch to virtualenv if you're using Anaconda on your personal computer:
- List the dependencies (requirements) of the application you want to use. To do this, you can:
- Run
pip show <package_name>from your virtual environment (if the package exists on PyPI) - Or, check if a
requirements.txtfile exists in the Git repository. - Or, check the
install_requiresvariable in thesetup.pyfile, which lists the requirements.
- Run
- Find which dependencies are Python packages and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries available on Anaconda Cloud, but you should not install them yourself on our clusters. They are already installed.
- Remove anything that is not a Python package from the dependency list (e.g., remove
cudatoolkitandcudnn). - Use a virtualenv in which you will install these dependencies.
Your application should now work. If not, do not hesitate to contact our technical support.
Using Apptainer¶
In some situations, the complexity of software dependencies requires a solution where the environment can be fully controlled. For these situations, we recommend the Apptainer tool: note that a Docker image can be converted into an Apptainer image. The only drawback of Apptainer is that images consume a lot of disk space, so if your research group plans to use multiple images, it would be wise to group them together in a single directory within the group's project space to avoid duplication.