Conda: Difference between revisions

From Grid5000
Jump to navigation Jump to search
Line 98: Line 98:
= Use conda on Grid'5000 =  
= Use conda on Grid'5000 =  


{{Note|text=Conda is already available in Grid'5000 as a module. You don't need to install Anaconda or Miniconda on Grid'5000!}}
Conda is already available in Grid'5000 as a module. You don't need to install Anaconda or Miniconda on Grid'5000! To make it available on a node or on a frontend, you need to load the Conda module as follows:


To make it available, execute from a node or a frontend of Grid'5000:
{{Term|location=inside|cmd=<code class="command">module load miniconda3</code>}}
{{Term|location=inside|cmd=<code class="command">module load miniconda3</code>}}



Revision as of 12:05, 4 March 2023

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

Note.png Note

The purpose of this document is:

  • explains how to use conda on Grid'5000
  • gives examples to install and configure software with conda to create environments for running HPC and AI jobs on Grid'5000

It was written by consolidating the following different information resources:


Introduction

Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.

The conda package and environment manager is included in all versions of Anaconda®, Miniconda, and Anaconda Repository. Conda is also available on conda-forge, a community channel.

Anaconda or Miniconda?

Anaconda contains a full distribution of packages while Miniconda is a condensed version that contains the essentials for standard purposes.

Reference:

Conda usage

Warning.png Warning

Conda packages are installed in $HOME/.conda You could, therefore, rapidly saturate the disk quota of your $HOME

Conda environments

Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments.

When you begin using conda, you already have a default environment named "base". You can create separate environments to keep your programs isolated from each other. Specifying the environment name confines conda commands to that environment.

  • List all your environments
Terminal.png $:
conda info --envs

or

Terminal.png $:
conda env list
  • Create a new environment
Terminal.png $:
conda create --name ENVNAME
  • Activate this environment before installing package
Terminal.png $:
conda activate ENVNAME

For further information:

Conda channels and packages

Channels are the locations of the repositories where Conda looks for packages. Channels may point to a Cloud repository or a private location on a remote or local repository that you or your organization created. In its default configuration, Conda can install and manage the over 7,500 packages at https://repo.anaconda.com/pkgs/ that are built, reviewed, and maintained by Anaconda. This is the default Conda channel which may require a paid license, as described in the repository terms of service a commercial license.

Other usefull channels:

Some tips:

  • List all packages + source channels
Terminal.png $:
conda list --show-channel-urls
  • Install a package from specific channel
Terminal.png $:
conda install -c CHANNELNAME PKG1 PKG2

For more information:

Conda package installation

TODO

Conda shell activation

Conda shell activation is the process of defining some shell functions that facilitate activating and deactivating Conda environments, as well as some optional features such as updating PS1 to show the active environment.

The conda shell function is mainly a forwarder function. It will delegate most of the commands to the real conda executable driven by the Python library.

Activate your Conda shell environment as follow:

Terminal.png $:
eval "$(conda shell.bash hook)"

By defaut, you are located in the base Conda environment that correspond to the base installation of Conda.

Suggested reading

Use conda on Grid'5000

Conda is already available in Grid'5000 as a module. You don't need to install Anaconda or Miniconda on Grid'5000! To make it available on a node or on a frontend, you need to load the Conda module as follows:

Terminal.png inside:
module load miniconda3

Basic conda usqge

  • Go to your favorite grid5000 site
Terminal.png inside:
ssh grenoble.g5k
  • Load conda and source it
Terminal.png fgrenoble:
module load miniconda3
Terminal.png fgrenoble:
eval "$(conda shell.bash hook)"
  • Create an environment (specify a Python version; otherwise, it is the module default version)
Terminal.png fgrenoble:
conda create -y -n <name> python=x.y
  • Load this environment
Terminal.png fgrenoble:
conda activate <name>
  • Exit from the environment loaded with module load
Terminal.png fgrenoble:
conda deactivate
  • To correctly delete the environment
Terminal.png fgrenoble:
conda env remove --name <name>
  • To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local).
Terminal.png fgrenoble:
conda clean -a


Warning.png Warning

Creating an environment can be time and resource consuming. Preferably reserve and connect to a node via oarsub command (mandatory if you need to access to specific hardware ressource like GPU).


AI librairies

Use NIVDIA tools

NVIDIA libraries are available via Conda and you can manage project specific versions of the NVIDIA CUDA Toolkit, NCCL, and cuDNN using Conda.

NVIDIA actually maintains their own Conda channel and the versions of CUDA Toolkit available from the default channels are the same as those you will find on the NVIDIA channel.

  • To compare build numbers version from default and nvidia channel
Terminal.png inside:
conda search --channel nvidia cudatoolkit

Compare also limitation of versions from the different channels: ie. NVIDIA

See:

Cudatoolkit

  • Load cudatoolkit from nvidia channel in a dedicated environment
Terminal.png inside:
conda create --name NvidiaTools
Terminal.png inside:
conda activate NvidiaTools
Terminal.png inside:
conda install cudatoolkit -c nvidia

Cuda

You can use conda-forge ou nvidia channels to install cuda

  • Installing Previous CUDA Releases

All Conda packages released under a specific CUDA version are labeled with that release version. To install a previous version, include that label in the install command such as:

Terminal.png inside:
conda install cuda -c nvidia/label/cuda-11.3.0

Note: install it in a dedicated environment or previously uninstall existing cuda installation as follow

Terminal.png inside:
conda remove cuda
  • To display installed version of Nvidia cuda compiler
Terminal.png inside:
nvcc --version

PyTorch

  • Simple installation pytorch from nvidia channel
Terminal.png flille:
conda install pytorch -c nvidia
  • Go on PyTorch website to see the installation command that suits you with conda
    • Example to combine Pytorch Stable (1.13.1), Python language, Cuda11.6 :
Terminal.png inside:
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
  • Verify your installation
Terminal.png flille:
which python

/home/login/.conda/envs/TutoConda/bin/python

  • Construct a randomly initialized tensor.
Terminal.png flille:
python
>>> import torch
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[0.3485, 0.6268, 0.8004],
        [0.3265, 0.9763, 0.5085],
        [0.6087, 0.6940, 0.8929],
        [0.2143, 0.6307, 0.5182],
        [0.0076, 0.6455, 0.5223]])
  • To print Cuda version
>>> import torch
>>> print("Pytorch CUDA Version is ", torch.version.cuda)
Pytorch CUDA Version is 11.6
  • Work on a GPU node : Reserve only one GPU (with the associated CPU cores and share of memory) in interactive mode, run:
Terminal.png flille:
oarsub -l gpu=1 -I
  • Load miniconda3 and activate your Pytorch environment
Terminal.png gpunode:
conda activate TutoConda
  • Launch python
Terminal.png gpunode:
python
  • To check whether CUDA is supported
>>> import torch
>>> print("Whether CUDA is supported by our system: ", torch.cuda.is_available())
Whether CUDA is supported by our system:  True


  • To know the CUDA device ID and name of the device
>>> import torch
>>> Cuda_id = torch.cuda.current_device()
>>> print("CUDA Device ID: ", torch.cuda.current_device())
CUDA Device ID:  0
>>> print("Name of the current CUDA Device: ", torch.cuda.get_device_name(Cuda_id))
Name of the current CUDA Device:  GeForce GTX 1080 Ti

Tensorflow

TensorFlow offers multiple levels of abstraction so you can choose the right one for your needs. Build and train models by using the high-level Keras API, which makes getting started with TensorFlow and machine learning easy.

  • install the current release of GPU TensorFlow
Terminal.png inside:
conda create -n TutoConda tensorflow
  • Work on a GPU node : Reserve only one GPU (with the associated CPU cores and share of memory) in interactive mode, run:
Terminal.png flyon:
oarsub -l gpu=1 -I
  • Load miniconda3 and activate your Tensorflow environment
Terminal.png gpunode:
module load miniconda3
Terminal.png gpunode:
eval "$(conda shell.bash hook)"
Terminal.png gpunode:
conda activate TutoConda
  • Launch python
Terminal.png gpunode:
python
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
  • Test the installation
Terminal.png gpunode:
python -c "import tensorflow as tf; x = 2.; print('tensorflow version', tf.__version__); print('hello, {}'.format(tf.matmul(x, x)))"
  • It displays
tensorflow version 2.0.0-rc1
hello, [[4.]]

To go further :


Scikit-learn

As explain on scikit-learn website you can use conda to install the lastest official release into a conda environment

  • Installation
Terminal.png inside:
conda create --name sklearn-env -c conda-forge scikit-learn
Terminal.png inside:
conda activate sklearn-env
  • Check your installation version
Terminal.png inside:
conda list scikit-learn
  • Test with a script
Terminal.png inside:
python -c "import sklearn; sklearn.show_versions()"



Keras

Note.png Note

This paragraph is based on this tutorial User:Ibada/Tuto Deep Learning

Keras is a high-level neural networks API, written in python, which is used as a wrapper of theano, tensorflow or CNTK. Keras allows to create deep learning experiments much more easily than using directly theano or tensorflow, it's the recommended tool for beginners and even advanced users who don't want to deal and spend too much time with the complexity of low-level libraries as theano and tensorflow.


  • Check if you have added anaconda binary path in your $PATH ( see the previous section ), type this command:
Terminal.png inside:
which python
  • The output must contain the path of conda installation
    • something like /grid5000/spack/opt/spack/linux-debianNN-x86_64/gcc-XX.Y.Z/miniconda3-*-***/bin/python inside Grid'5000 (frontal or node)
    • something like ~/miniconda*/bin on your workstation
  • If it's ok then you can install keras:
Terminal.png inside:
conda install keras


Other libraries

GCC

Use conda-forge as channel

  • Install last version of gcc via conda-forge
Terminal.png inside:
conda install -c conda-forge gcc_linux-64 gxx_linux-64

OpenMPI

Here's an example of installing Open MPI in a conda environment optimized with ucx to use the cluster's high-bandwidth and low-latency.

UCX exposes a set of abstract communication primitives that utilize the best of available hardware resources and offloads. These include RDMA (InfiniBand and RoCE), TCP, GPUs, shared memory, and network atomic operations.

We use mpi program on a cluster with OminPath or InfiniBand Network connection and compare performance between two nodes using or not ucx driver.

We first choose an appropriate cluster using Grid'5000 Hardware Documentation : dahu cluster at Grenoble

  • Log into Grenoble grid'5000 frontal
Terminal.png outside:
ssh login@access.grid5000.fr
Terminal.png inside:
ssh grenoble
  • Load conda and source it
Terminal.png fgrenoble:
module load miniconda3 && eval "$(conda shell.bash hook)"
  • Create an isolated conda environment for MPI
Terminal.png fgrenoble:
conda create --name mpienv && conda activate mpienv
  • Install openmpi, ucx and gcc package into mpienv environment
Terminal.png (mpienv):
conda install -c conda-forge gcc_linux-64 openmpi ucx
  • Install NetPIPE to test latency and network throughput (NetPIPE not available as conda package)
Terminal.png (mpienv):
cd $HOME ; mkdir SRC && cd SRC
Terminal.png (mpienv):
tar zvxf NetPIPE-3.7.2.tar.gz
Terminal.png (mpienv):
cd NetPIPE-3.7.2/ && make mpi
  • Reserve 2 cores on 2 separate nodes and enter into interactive session :
Terminal.png (mpienv):
oarsub -I -p dahu -l /nodes=2/core=1
  • On node dahu-X : Load conda, activate mpienv, modify PATH
Terminal.png dahu:
module load miniconda3 && eval "$(conda shell.bash hook)"
Terminal.png dahu:
conda activate mpienv
Terminal.png dahu:
export PATH=~/SRC/NetPIPE-3.7.2:$PATH
  • Run MPI without ucx (use standard network):
Terminal.png dahu:
mpirun -np 2 --machinefile $OAR_NODEFILE --prefix $CONDA_PREFIX --mca plm_rsh_agent oarsh NPmpi
0: dahu-3
1: dahu-30
Now starting the main loop
  0:       1 bytes   6400 times -->      0.54 Mbps in      14.21 usec
  1:       2 bytes   7035 times -->      1.07 Mbps in      14.20 usec
  2:       3 bytes   7043 times -->      1.61 Mbps in      14.22 usec
...
116: 4194304 bytes     12 times -->   8207.76 Mbps in    3898.75 usec
117: 4194307 bytes     12 times -->   8161.45 Mbps in    3920.87 usec
...
  • Run MPI with ucx (use rapid network):
Terminal.png dahu:
mpirun -np 2 --machinefile $OAR_NODEFILE --prefix $CONDA_PREFIX --mca plm_rsh_agent oarsh --mca pml ucx --mca osc ucx NPmpi
0: dahu-3
1: dahu-30
Now starting the main loop
  0:       1 bytes  19082 times -->      1.69 Mbps in       4.50 usec
  1:       2 bytes  22201 times -->      3.08 Mbps in       4.95 usec
  2:       3 bytes  20212 times -->      4.46 Mbps in       5.13 usec
...
116: 4194304 bytes     46 times -->  30015.10 Mbps in    1066.13 usec
117: 4194307 bytes     46 times -->  30023.66 Mbps in    1065.83 usec
...

How to share conda environments between multiple users

By default, conda and all packages are installed locally with a user-specific configuration. In the Grid'5000 context, conda installation is already a shared installation because it is installed via module.

The commande $ conda info shows the emplacement of conda and base environment on a NFS storage:

     active environment : base
    active env location : /grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg
...
       base environment : /grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg  (read only)
...
       envs directories : /home/lmirtain/.conda/envs
                          /grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg/envs
...

Grid'5000 conda directory: /grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg)

You can make conda environments and any number of packages available to a group of one or more users. We explain here how to do it in the case of a group of Grid'5000 users.

You want to make environments available to other users. There is two approach to share conda environments with other users.



Export an environment as a yaml file

  • Export it as follow:
$ conda env export > environment.yml 
  • Share it by putting the yaml file in your public folder
$ cp environment.yml ~/public/
  • Other users can create the environment from the environment.yml file
$ conda env create -f ~/<login>/public/environment.yml
  • Avantage : it prevents other users from damaging the environment if they add packages that could conflict with other packages and/or even delete packages that another user might need.
  • Inconvenient : it's not a true shared environment

create the environment on a shared storage with --prefix option in your create and then instruct your users to add the shared path to their conda config file

  • creation of shared environment

$ conda create --prefix /path/to/share/storage/ENVNAME

  • activatation of shared environment

$ conda activate /path/to/share/storage/ENVNAME


On Grid'5000, the appropriate shared storage is the Groupe Storage that is associated to the group that wants to share conda environments

For example : replace /path/to/share/storage/ENVNAME with /srv/storage/storage_name@server_hostname_(fqdn)


Create a specific environment for PowerPC arch

Because the version of pytorch in PowerAI is too old for py38 or py39, we must use python 3.7.

  • Create a dedicated environment
Terminal.png inside:
conda create create --name pytorch-ppc64-py37 python=3.7
  • Load it
Terminal.png inside:
conda activate pytorch-ppc64-py37
  • Install pytorch in this environment
Terminal.png inside:
conda install pytorch -c

Mamba as an alternative to conda

mamba is a reimplementation of the conda package manager in C++. It consists of:

  • mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions
  • micromamba: a pure C++-based CLI, self-contained in a single-file executable
  • libmamba: a C++ library exposing low-level and high-level APIs on top of which both mamba and micromamba are built

mamba is fully compatible with conda packages and supports most of conda’s commands.

Mamba is relatively new and unpopular compared to Conda. That means there are probably more undiscovered bugs, and that new bugs may take longer to be discovered. That said, the agility displayed by the Mamba developers makes me think they'd probably fix newly discovered bugs faster.

mamba has to be considerate when using a devops chain in order to test and deploy an environment (ie. docker image) with continuous integration pipelines. Conda has a reputation for taking time when dealing with complex sets of dependencies so CI jobs can take longer than they need to.

  • Mamba installation when already have conda
$ conda install mamba -n base -c conda-forge
  • Installing packages is similarly easy, example:
$ mamba install python=3.8 jupyter -c conda-forge

To go further with mamba