Conda: Difference between revisions

From Grid5000
Jump to navigation Jump to search
Line 171: Line 171:


{{Warning|text=As ''module'' command is not a real executable but a shell function, it must be executed in an actual shell to work.<br>A simple <code class="command">oarsub "module load conda"</code> would fail.}}  
{{Warning|text=As ''module'' command is not a real executable but a shell function, it must be executed in an actual shell to work.<br>A simple <code class="command">oarsub "module load conda"</code> would fail.}}  
{{Warning|text=Lucas> tu es sûr ? ça semble marcher pourtant}}


First prepare your conda environment on the frontend:  
First prepare your conda environment on the frontend:  
Line 189: Line 190:
{{Term|location=fsiteA|cmd=<code class="command">oarsub 'bash -l -c "module load conda ; eval "$(conda shell.bash hook)" ; conda activate testconda ; conda info ; conda list -n testconda --show-channel-url"'</code>}}
{{Term|location=fsiteA|cmd=<code class="command">oarsub 'bash -l -c "module load conda ; eval "$(conda shell.bash hook)" ; conda activate testconda ; conda info ; conda list -n testconda --show-channel-url"'</code>}}
<pre>OAR_JOB_ID=1539228</pre>
<pre>OAR_JOB_ID=1539228</pre>
{{Warning|text=Lucas> le conda shell.bash hook est inutile ici, non ?}}


* Is job finished ?
* Is job finished ?

Revision as of 09:11, 26 June 2023

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.


Introduction

Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.

Conda, Miniconda, Anaconda ?

  • conda is the package manager.
  • miniconda is an minimal installer for conda.
  • anaconda is another installer, that includes 160+ packages, through the anaconda meta-package.

On Grid'5000, we installed conda using the miniconda installer, but you are free to create an anaconda environment, using the anaconda meta-package.

Related links

Conda on Grid'5000

Conda is already available in Grid'5000 as a module. You do not need to install Anaconda or Miniconda on Grid'5000 !

Load Conda module

  • To show the available versions:
Terminal.png frontal:
module avail conda
  • To make it available on a node or on a frontend, load the Conda module as follow (default version):
Terminal.png frontal:
module load conda

Optional: Conda initialization and activation in your shell

Conda initialization is the process of defining some shell functions that facilitate activating and deactivating Conda environments, as well as some optional features such as updating PS1 to show the active environment. It is not required to use Conda.

The conda shell function is mainly a forwarder function. It will delegate most of the commands to the real conda executable driven by the Python library.

There are two ways to activate conda:

  • 1. occasionally : activate conda in your current shell
Terminal.png $:
eval "$(conda shell.bash hook)"
  • 2. always : activate conda in your login shell environment permanently (this command modifies your .bashrc by adding conda setup directives)
Terminal.png $:
conda init

As module allows you to install different versions of conda, we recommend the "occasionally" method to ensure that your shell will contain the PATH corresponding to the loaded version of conda.

The conda activate or conda deactivate commands relies on the conda shell initialization to load/unload the corresponding conda environment variables to the current shell session.

By defaut, you are located in the base Conda environment that correspond to the base installation of Conda.

If you’d prefer that conda’s base environment not be activated on startup, set the auto_activate_base parameter to false:

Terminal.png $:
conda config --set auto_activate_base false

Verify your conda configuration with this command:

Terminal.png $:
conda config --show

Look at all available configuration options with:

Terminal.png $:
conda config --describe

Conda environments

Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments.

When you begin using conda, you already have a default environment named base. You can create separate environments to keep your programs isolated from each other. Specifying the environment name confines conda commands to that environment.

Warning.png Warning

The base environment is stored in a read-only directory as shown by conda info command That's why you need to systematically create your own conda environments to install the software you need.

  • List all your environments
Terminal.png $:
conda info --envs

or

Terminal.png $:
conda env list
  • Create a new environment
Terminal.png $:
conda create --name ENVNAME
  • Activate this environment before installing package
Terminal.png $:
conda activate ENVNAME

For further information:

Conda package installation

In its default configuration (the default Conda channel), Conda can install and manage the over 7,500 packages at https://repo.anaconda.com/pkgs/ that are built, reviewed, and maintained by Anaconda.

Terminal.png $:
conda install <package>
  • Install specific version of package:
Terminal.png $:
conda install <package>=<version>
  • Uninstall a package:
Terminal.png $:
conda uninstall <package>

For further information:

Conda package installation from channels

Channels are the locations of the repositories where Conda looks for packages. Channels may point to a Cloud repository or a private location on a remote or local repository that you or your organization created. Useful channels are:

To install a package from a specific channel:

Terminal.png $:
conda install -c <channel_name> <package>
  • List all packages installed with their source channels
Terminal.png $:
conda list --show-channel-urls

For further information:

Warning.png Warning

Installing Conda packages can be time and resource consuming. Preferably use a node (instead of a frontend) to perform such an operation. Note, using a node is mandatory if you need to access specific hardware resources like GPU.

Application examples

Create an environment

For example create environment <env_name> (specify a Python version; otherwise, it is the module default version)

Terminal.png fgrenoble:
conda create -y -n <env_name> python=x.y

Load this environment

Terminal.png fgrenoble:
conda activate <env_name>

Install a package into

Terminal.png fgrenoble:
conda install <package_name>

Exit from the loaded environment

Terminal.png fgrenoble:
conda deactivate

Remove unused Conda environments

Warning.png Warning

Conda packages are installed in $HOME/.conda. You could, therefore, rapidly saturate your homedir quota (25GB by default). Do not forget to occasionally remove unused Conda environment to free up space.

  • To delete an environment
Terminal.png fgrenoble:
conda deactivate
conda env remove --name <env_name>
  • To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local).
Terminal.png fgrenoble:
conda clean -a

Use a Conda environment in a job

As seen in the previous section, the Conda environment is stored by default in user's homedir (at ~/.conda). Once the environment is created and packages installed, it is usable on all nodes from the given site.

For interactive jobs

Warning.png Warning

Lucas> le conda shell.bash hook est inutile ici, non ?

Load, init, and active you conda environment env_name in an interactive job

Terminal.png frontal:
oarsub -I
Terminal.png node:
module load conda

eval "$(conda shell.bash hook)"

conda activate env_name

For batch jobs

Load, initialize, and active you conda environment env_name in a batch job

Warning.png Warning

As module command is not a real executable but a shell function, it must be executed in an actual shell to work.
A simple oarsub "module load conda" would fail.

Warning.png Warning

Lucas> tu es sûr ? ça semble marcher pourtant

First prepare your conda environment on the frontend:

  • module load and conda initialization
  • conda creation of an environement testconda containing gcc from conda-forge channel"
  • list installed packages with source info
Terminal.png fsiteA:
module load conda

conda eval "$(conda shell.bash hook)"
conda create --name testconda
conda activate testconda

conda install -c conda-forge gcc_linux-64 gxx_linux-64
  • launch this commands and keep output result
Terminal.png fsiteA:
conda info
conda list -n testconda --show-channel-urls

In this example, we launch a job that does the same tasks but in batch job.

  • The important step is to source shell environment to execute module and activate conda
Terminal.png fsiteA:
oarsub 'bash -l -c "module load conda ; eval "$(conda shell.bash hook)" ; conda activate testconda ; conda info ; conda list -n testconda --show-channel-url"'
OAR_JOB_ID=1539228
Warning.png Warning

Lucas> le conda shell.bash hook est inutile ici, non ?

  • Is job finished ?
Terminal.png fsiteA:
oarsub -C 1539228
# Error: job 1539228 is not running. Its current state is Finishing.
  • Compare output with the previous one : they should be identical
Terminal.png fsiteA:
cat OAR.1539228.std

Advanced Conda environment operations

Synchronize Conda environments between Grid'5000 sites

  • To synchronize a Conda directory from a siteA to a siteB:
Terminal.png fsiteA:
rsync --dry-run --delete -avz ~/.conda siteB.grid5000.fr:~

To really do things, the --dry-run argument has to be removed and siteB has to be replaced by a real site name.

Share Conda environments between multiple users

You can use two different approaches to share Conda environments with other users.

Export an environment as a yaml file

  • Export it as follow:
Terminal.png fgrenoble:
conda env export > environment.yml
  • Share it by putting the yaml file in your public folder
Terminal.png fgrenoble:
cp environment.yml ~/public/
  • Other users can create the environment from the environment.yml file
Terminal.png fgrenoble:
conda env create -f ~/<login>/public/environment.yml
  • Advantage : it prevents other users from damaging the environment if they add packages that could conflict with other packages and/or even delete packages that another user might need.
  • Inconvenient : it's not a true shared environment. The environment is duplicated on other users' home directory. Any modification on one Conda environment will not be automatically replicated on others.

Use a group storage

Group Storage gives you the possibility to share a storage between multiple users. You can take advantage of a group storage to share a single Conda environment among multiple users.

  • Create a shared Conda environment with --prefix to specify the path to use to store the conda environment
Terminal.png flyon:
conda create --prefix /srv/storage/storage_name@server_hostname_(fqdn)/ENVNAME


  • Activate the shared environment (share this command with the targeted users)
Terminal.png flyon:
conda activate /srv/storage/storage_name@server_hostname_(fqdn)/ENVNAME
  • Advantage : It avoids storing duplicate packages and makes any modification accessible to all users
  • Inconvenients :
    • Users could potentially harm the environment by installing or removing packages.
    • When installing additional packages, conda still stores them in the package cache located in your home directory. Use conda clean as described above to clean those files.


  • Create your environments by defaut in a group storage location

You can modify you ~/.condarc file to specify this location for conda environment and package installation as follow (change the location to suit your group and your convenience). Add this lines:

pkgs_dirs:
  - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/pkgs/
envs_dirs:
  - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/envs/

Mamba as an alternative to Conda

mamba is a reimplementation of the conda package manager in C++. Mamba is fully compatible with Conda packages and supports most of Conda's commands. It consists of:

  • mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions
  • micromamba: a pure C++-based CLI, self-contained in a single-file executable
  • libmamba: a C++ library exposing low-level and high-level APIs on top of which both mamba and micromamba are built

Mamba is relatively new and unpopular compared to Conda. That means there are probably more undiscovered bugs, and that new bugs may take longer to be discovered. mamba has to be considerate when using a devops chain in order to test and deploy an environment (i.e., docker images) with continuous integration pipelines. Conda has a reputation for taking time when dealing with complex sets of dependencies so CI jobs can take longer than they need to.

  • Mamba installation when already have Conda
Terminal.png inside:
conda install mamba -c conda-forge
  • Installing packages is similarly easy, example:
Terminal.png inside:
mamba install python=3.8 jupyter -c conda-forge

To go further:

Build your HPC-IA framework with conda

Here are some pointers to help you set up your software environment for HPC or AI with conda