Anaconda/en
Anaconda is a Python distribution.
Before using Anaconda
We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics, etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Moreover, using Anaconda on a cluster may lead to multiple problems. Before using Anaconda, we ask that you contact our Technical support, so that our experts can investigate alternatives with you. If you choose to use Anaconda regardless, note that our team may not be able to support you if you encounter issues.
Why is Anaconda not recommended on a cluster?¶
Anaconda may cause issues on a cluster for multiple reasons:
- Anaconda very often installs software (compilers, scientific libraries, etc.) which already exist on our clusters as modules, with a configuration that is not optimal, and which may cause conflicts.
- It installs binaries which are not optimized for the processor architecture on our clusters. Your jobs may be slower because of it.
- It makes incorrect assumptions about the location of various system libraries. Your jobs may encounter errors when running.
- Anaconda uses the
$HOMEdirectory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory. - Anaconda is slower than the installation of packages via Python wheels.
- Anaconda modifies the
$HOME/.bashrcfile, which can easily cause conflicts.
What are alternatives?¶
The first step you should take is to contact our Technical support, so that our experts investigate with your what is the best alternative for your needs. If you prefer to attempt it yourself, two main options are listed below.
Transition from Conda to virtualenv¶
A virtual environment offers you all the functionality which you need to use Python on our clusters. This should be the first option that you explore. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:
- List the dependencies (requirements) of the application you want to use. To do so, you can:
- Run
pip show <package_name>from your virtual environment (if the package exists on PyPI) - Or, check if there is a
requirements.txtfile in the Git repository. - Or, check the variable
install_requiresof the filesetup.py, which lists the requirements.
- Run
- Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
- Remove from the list of dependencies everything which is not a Python module (e.g.
cudatoolkitandcudnn). - Use a virtual environment in which you will install your dependencies.
Your software should run - if it doesn't, don't hesitate to contact us.
Using Apptainer¶
In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool Apptainer; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.