Conda
Note | |
---|---|
This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team. |
Introduction
Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.
The conda package and environment manager is included in all versions of Anaconda®, Miniconda, and Anaconda Repository. Conda is also available on conda-forge, a community channel.
Anaconda or Miniconda?
Anaconda contains a full distribution of packages while Miniconda is a condensed version that contains the essentials for standard purposes.
References
Conda usage
Conda initialization and activation
Conda initialization is the process of defining some shell functions that facilitate activating and deactivating Conda environments, as well as some optional features such as updating PS1 to show the active environment.
The conda shell function is mainly a forwarder function. It will delegate most of the commands to the real conda executable driven by the Python library.
There are two ways to activate conda:
- 1. activate conda in your current shell
- 2. activate conda in your login shell environment permanently (this command modifies your .bashrc by adding conda setup directives)
If you’d prefer that conda’s base environment not be activated on startup, set the auto_activate_base parameter to false:
By defaut, you are located in the base
Conda environment that correspond to the base installation of Conda.
The conda activate
or
conda desactivate
commands relies on the conda shell initialization to load/unload the corresponding conda environment variables to the current shell session.
Conda environments
Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments.
When you begin using conda, you already have a default environment named "base". You can create separate environments to keep your programs isolated from each other. Specifying the environment name confines conda commands to that environment.
- List all your environments
or
- Create a new environment
- Activate this environment before installing package
For further information:
- https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
- Managing your data science project environments with Conda
Conda package installation
In its default configuration (the default Conda channel), Conda can install and manage the over 7,500 packages at https://repo.anaconda.com/pkgs/ that are built, reviewed, and maintained by Anaconda.
- Install specific version of package:
- Uninstall a package:
For more information:
Conda package installation from channels
Channels are the locations of the repositories where Conda looks for packages. Channels may point to a Cloud repository or a private location on a remote or local repository that you or your organization created. Useful channels are:
conda-forge
from https://conda-forge.org. It is free for all to use.nvidia
from https://anaconda.org/nvidia. It provides Nvidia's software.
To install a package from a specific channel:
- List all packages installed with their source channels
For more information:
Suggested reading
- Conda cheat sheet for current commands
- Getting Started with Conda
Using conda on Grid'5000
Conda is already available in Grid'5000 as a module. You do not need to install Anaconda or Miniconda on Grid'5000! To make it available on a node or on a frontend, you need to load the Conda module as follow:
- For Miniconda:
conda 22.11.1
- For Anaconda:
conda 4.12.0
By default, Conda and all packages are installed locally with a user-specific configuration. In the Grid'5000 context, Conda comes with some pre-existing packages in the base
environment.
- To list the emplacement of Conda and the current environment:
active environment : base active env location : /grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg ... base environment : /grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg (read only) ... envs directories : /home/lmirtain/.conda/envs /grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg/envs ...
You can see here that the active environment is base
and its emplacement is on a NFS storage (/grid5000/....
).
- To list installed packages in the current environment:
Create conda environments on Grid'5000
Basic Conda workflow
- Load conda module and activate bash completion
- Create an environment (specify a Python version; otherwise, it is the module default version)
- Load this environment
- Install a package
- Exit from the loaded environment
Remove unused Conda environments
Warning | |
---|---|
Conda packages are installed in |
- To delete an environment
- To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local).
Use a Conda environment on Grid'5000
As seen in the previous section, the Conda environment is stored by default in user's homedir (at ~/.conda
). Once the environment is created and packages installed, it is usable on all nodes from the given site.
For interactive jobs
For batch jobs
Warning | |
---|---|
As module command is not a real executable but a shell function, it must be executed in an actual shell to work. A simple |
An example to show in a batch job how to load miniconda, init conda, activate your conda environment and verify it works
- first we prepare our conda environment on the frontend:
- load on miniconda, conda init (to modify our
~/.bashrc
file) - conda creation of an environement "testconda" containing "gcc" from conda-forge channel"
- list installed packages with source info
- load on miniconda, conda init (to modify our
- In this example, we launch a job that does the same tasks but in batch job.
- The important step is to source shell environment to execute module and activate conda
fsiteA :
|
oarsub 'bash -l -c ". /etc/profile ; module load miniconda3 ; source ~/.bashrc ; conda activate testconda ; conda info ; conda list -n testconda --show-channel-url"' |
Advanced Conda environment operations
Synchronize Conda environments between Grid'5000 sites
- To synchronize a Conda directory from a siteA to a siteB:
To really do things, the --dry-run
argument has to be removed and siteB has to be replaced by a real site name.
You can use two different approaches to share Conda environments with other users.
Export an environment as a yaml file
- Export it as follow:
- Share it by putting the yaml file in your public folder
- Other users can create the environment from the
environment.yml
file
- Advantage : it prevents other users from damaging the environment if they add packages that could conflict with other packages and/or even delete packages that another user might need.
- Inconvenient : it's not a true shared environment. The environment is duplicated on other users' home directory. Any modification on one Conda environment will not be automatically replicated on others.
Use a group storage
Group Storage gives you the possibility to share a storage between multiple users. You can take advantage of a group storage to share a single Conda environment among multiple users.
- Create a shared Conda environment (
--prefix
allows you to specify the path to store the conda environment)
- Activate the shared environment (share this command with the targeted users)
- Advantage : It avoids storing duplicate packages and makes any modification accessible to all users
- Inconvenient : Users could potentially harm the environment by installing or removing packages.
Mamba as an alternative to Conda
mamba is a reimplementation of the conda package manager in C++. Mamba is fully compatible with Conda packages and supports most of Conda's commands. It consists of:
- mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions
- micromamba: a pure C++-based CLI, self-contained in a single-file executable
- libmamba: a C++ library exposing low-level and high-level APIs on top of which both mamba and micromamba are built
Mamba is relatively new and unpopular compared to Conda. That means there are probably more undiscovered bugs, and that new bugs may take longer to be discovered. mamba has to be considerate when using a devops chain in order to test and deploy an environment (i.e., docker images) with continuous integration pipelines. Conda has a reputation for taking time when dealing with complex sets of dependencies so CI jobs can take longer than they need to.
- Mamba installation when already have Conda
- Installing packages is similarly easy, example:
To go further:
Build your HPC-IA framework with conda
Here are some pointers to help you set up your software environment for HPC or AI with conda