|
|
Line 10: |
Line 10: |
| * [[User:Bjonglez/Debian11/Deep Learning Frameworks|Deep Learning Frameworks tutorial]], Benjamin Jonglez | | * [[User:Bjonglez/Debian11/Deep Learning Frameworks|Deep Learning Frameworks tutorial]], Benjamin Jonglez |
| }} | | }} |
|
| |
|
| |
|
| = Introduction = | | = Introduction = |
Line 107: |
Line 106: |
| {{Term|location=fsophia|cmd=<code class="command">module load miniconda3</code>}} | | {{Term|location=fsophia|cmd=<code class="command">module load miniconda3</code>}} |
| {{Term|location=fsophia|cmd=<code class="command">conda --version</code>}} | | {{Term|location=fsophia|cmd=<code class="command">conda --version</code>}} |
| conda 4.10.3 | | conda 22.11.1 |
|
| |
|
| * For Anaconda: | | * For Anaconda: |
Line 167: |
Line 166: |
| * To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local). | | * To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local). |
| {{Term|location=fgrenoble|cmd=<code class="command">conda clean -a</code>}} | | {{Term|location=fgrenoble|cmd=<code class="command">conda clean -a</code>}} |
|
| |
| == Install AI librairies ==
| |
|
| |
| === Use NVIDIA tools ===
| |
|
| |
| NVIDIA libraries are available via Conda. It gives you the possibility to manage project specific versions of the NVIDIA CUDA Toolkit, NCCL, and cuDNN. NVIDIA actually maintains their own Conda channel. The versions of CUDA Toolkit available from the default channels are the same as those you will find on the NVIDIA channel.
| |
|
| |
| * To compare build numbers version from default and nvidia channel
| |
| {{Term|location=inside|cmd=<code class="command">conda search --channel nvidia cudatoolkit</code>}}
| |
|
| |
| See:
| |
| * [https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#conda-installation Nvidia doc: Installing CUDA Using Conda]
| |
| * [https://towardsdatascience.com/managing-cuda-dependencies-with-conda-89c5d817e7e1 “Best practices” Managing CUDA dependencies with Conda ]
| |
|
| |
| ==== Cudatoolkit ====
| |
|
| |
| * Install ''cudatoolkit'' from '''nvidia''' channel.
| |
|
| |
| {{Term|location=inside|cmd=<code class="command">conda install cudatoolkit -c nvidia</code>}}
| |
|
| |
| Note: do not forget to create a dedicated environment before.
| |
|
| |
| ==== Cuda ====
| |
|
| |
| ''cuda'' is available in both '''conda-forge''' or '''nvidia''' channels.
| |
|
| |
| * Install ''cuda'' from '''nvidia''' channel:
| |
|
| |
| {{Term|location=inside|cmd=<code class="command">conda install cuda -c nvidia</code>}}
| |
|
| |
| Note: do not forget to create a dedicated environment before.
| |
|
| |
| * Installing Previous CUDA Releases
| |
|
| |
| All Conda packages released under a specific CUDA version are labeled with that release version. To install a previous version, include that label in the install command to ensure that all cuda dependencies come from the wanted CUDA version. For instance, if you want to install cuda 11.3.0:
| |
|
| |
| {{Term|location=inside|cmd=<code class="command">conda install cuda -c nvidia/label/cuda-11.3.0</code>}}
| |
|
| |
| * To display the version of Nvidia cuda compiler installed:
| |
| {{Term|location=inside|cmd=<code class="command">nvcc --version</code>}}
| |
|
| |
| === PyTorch ===
| |
|
| |
| PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. It can automatically detect GPU availability at run-time.
| |
|
| |
| ; Installation
| |
|
| |
| * Simple installation PyTorch from nvidia channel
| |
| {{Term|location=flille|cmd=<code class="command">conda install pytorch -c nvidia</code>}}
| |
| Note: do not forget to create a dedicated environment before.
| |
|
| |
| * For a full installation, you might want to combine Pytorch Stable (e.g., 1.13.1) with Python language and specific Cuda version (e.g., 11.6). This can be done by:
| |
| {{Term|location=flille|cmd=<code class="command">conda install pytorch==1.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia</code>}}
| |
|
| |
| {{Warning|text=You must adapt the version number of pytorch-cuda according to your version of cuda installed on your system. GPU will not be detected by PyTorch if the version of cuda mismatches with the one installed on your system.}}
| |
|
| |
| ; Verify your installation
| |
|
| |
| * Check which Python binary is used:
| |
|
| |
| {{Term|location=flille|cmd=<code class="command">which python</code>}}
| |
| <code>/home/</code><code class="replace">login</code><code>/.conda/envs/</code><code class="replace">env_name</code><code>/bin/python</code>
| |
|
| |
| * Construct a randomly initialized tensor.
| |
| {{Term|location=flille|cmd=<code class="command">python</code>}}
| |
| <pre>
| |
| >>> import torch
| |
| >>> x = torch.rand(5, 3)
| |
| >>> print(x)
| |
| tensor([[0.3485, 0.6268, 0.8004],
| |
| [0.3265, 0.9763, 0.5085],
| |
| [0.6087, 0.6940, 0.8929],
| |
| [0.2143, 0.6307, 0.5182],
| |
| [0.0076, 0.6455, 0.5223]])
| |
| </pre>
| |
|
| |
| * Print the Cuda version
| |
| {{Term|location=flille|cmd=<code class="command">python</code>}}
| |
| <pre>
| |
| >>> import torch
| |
| >>> print("Pytorch CUDA Version is ", torch.version.cuda)
| |
| Pytorch CUDA Version is 11.6
| |
| </pre>
| |
|
| |
| ; Verify your installation on a GPU node
| |
|
| |
| * Reserve only one GPU (with the associated CPU cores and share of memory) in interactive mode:
| |
| {{Term|location=flille|cmd=<code class="command">oarsub -l gpu=1 -I</code>}}
| |
| * Load miniconda3 and activate your Pytorch environment
| |
| {{Term|location=gpunode|cmd=<code class="command">module load miniconda3; eval "$(conda shell.bash hook)"; conda activate <env_name></code>}}
| |
| * Launch python and execute the following code:
| |
| {{Term|location=gpunode|cmd=<code class="command">python</code>}}
| |
| <pre>
| |
| >>> import torch
| |
| >>> print("Whether CUDA is supported by our system: ", torch.cuda.is_available())
| |
| Whether CUDA is supported by our system: True
| |
| </pre>
| |
|
| |
| * To know the CUDA device ID and name of the device, you can run:
| |
| {{Term|location=gpunode|cmd=<code class="command">python</code>}}
| |
| <pre>
| |
| >>> import torch
| |
| >>> Cuda_id = torch.cuda.current_device()
| |
| >>> print("CUDA Device ID: ", torch.cuda.current_device())
| |
| CUDA Device ID: 0
| |
| >>> print("Name of the current CUDA Device: ", torch.cuda.get_device_name(Cuda_id))
| |
| Name of the current CUDA Device: GeForce GTX 1080 Ti
| |
| </pre>
| |
|
| |
| === Tensorflow ===
| |
|
| |
| TensorFlow offers multiple levels of abstraction so you can choose the right one for your needs. Build and train models by using the high-level Keras API, which makes getting started with TensorFlow and machine learning easy.
| |
|
| |
| ; Installation
| |
|
| |
| * Install TensorFlow from '''conda-forge''' channel
| |
| {{Term|location=inside|cmd=<code class="command"> conda install -c conda-forge tensorflow</code>}}
| |
| Note: do not forget to create a dedicated environment before.
| |
|
| |
| ; Verify your installation
| |
|
| |
| * Launch python
| |
| {{Term|location=gpunode|cmd=<code class="command">python</code>}}
| |
| <pre>
| |
| Python 3.7.11 (default, Jul 27 2021, 14:32:16)
| |
| [GCC 7.5.0] :: Anaconda, Inc. on linux
| |
| Type "help", "copyright", "credits" or "license" for more information.
| |
| </pre>
| |
|
| |
| ; Verify your installation on a GPU node
| |
|
| |
| * Reserve only one GPU (with the associated CPU cores and share of memory) in interactive mode:
| |
| {{Term|location=flyon|cmd=<code class="command">oarsub -l gpu=1 -I</code>}}
| |
|
| |
| * Load miniconda3 and activate your Pytorch environment
| |
| {{Term|location=gpunode|cmd=<code class="command">module load miniconda3; eval "$(conda shell.bash hook)"; conda activate <env_name></code>}}
| |
|
| |
| * Test the installation
| |
| {{Term|location=gpunode|cmd=<code class="command">python</code>}}
| |
| <pre>
| |
| >>> import tensorflow as tf
| |
| >>> print('tensorflow version', tf.__version__)
| |
| tensorflow version 2.0.0-rc1
| |
| Whether CUDA is supported by our system: True
| |
| >>> x = [[2.]]
| |
| >>> print('hello, {}'.format(tf.matmul(x, x)))
| |
| hello, [[4.]]
| |
| </pre>
| |
|
| |
| To go further :
| |
| * https://docs.anaconda.com/anaconda/user-guide/tasks/tensorflow/
| |
|
| |
| === Scikit-learn ===
| |
|
| |
| Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.
| |
|
| |
| ; Installation
| |
|
| |
| {{Term|location=fnancy|cmd=<code class="command">conda install -c conda-forge scikit-learn</code>}}
| |
| Note: do not forget to create a dedicated environment before.
| |
|
| |
| ; Verify your installation
| |
|
| |
| {{Term|location=fnancy|cmd=<code class="command">python</code>}}
| |
| <pre>
| |
| >>> import sklearn
| |
| >>> sklearn.show_versions()
| |
| System:
| |
| python: 3.10.9 (main, Mar 1 2023, 18:23:06) [GCC 11.2.0]
| |
| executable: /home/xxxx/.conda/envs/test/bin/python
| |
| machine: Linux-5.10.0-21-amd64-x86_64-with-glibc2.31
| |
|
| |
| Python dependencies:
| |
| pip: 22.3.1
| |
| setuptools: 65.6.3
| |
| sklearn: 1.0.2
| |
| numpy: 1.23.5
| |
| scipy: 1.8.1
| |
| Cython: None
| |
| pandas: None
| |
| matplotlib: None
| |
| joblib: 1.2.0
| |
| threadpoolctl: 3.1.0
| |
|
| |
| Built with OpenMP: True
| |
| </pre>
| |
|
| |
| To go further:
| |
| * [https://scikit-learn.org/stable/install.html scikit-learn Installation]
| |
| * [https://scikit-learn.org/stable/tutorial/basic/tutorial.html scikit-learn.org Tutorials]
| |
| * [https://www.dataquest.io/blog/sci-kit-learn-tutorial/ Dataquest Scikit-learn Tutorial]
| |
| * [https://www.digitalocean.com/community/tutorials/python-scikit-learn-tutorial Another Python SciKit Learn Tutorial]
| |
|
| |
| === Keras ===
| |
|
| |
| Keras is a high-level neural networks API, written in python, which is used as a wrapper of TensorFlow. It was developed with a focus on enabling fast experimentation. It's the recommended tool for beginners and even advanced users who don't want to deal and spend too much time with the complexity of low-level libraries as TensorFlow.
| |
|
| |
| ; Installation
| |
|
| |
| * Since version 2.4, Keras refocus exclusively on the TensorFlow implementation of Keras. Therefore, to use Keras, you will need to have the TensorFlow package installed:
| |
|
| |
| {{Term|location=frennes|cmd=<code class="command">conda install -c conda-forge tensorflow</code>}}
| |
|
| |
| Note: do not forget to create a dedicated environment before.
| |
|
| |
| ; Verify the installation
| |
|
| |
| * Check which Python binary is used:
| |
|
| |
| {{Term|location=frennes|cmd=<code class="command">which python</code>}}
| |
| <code>/home/</code><code class="replace">login</code><code>/.conda/envs/</code><code class="replace">env_name</code><code>/bin/python</code>
| |
|
| |
| * Print the Keras version
| |
|
| |
| {{Term|location=frennes|cmd=<code class="command">python</code>}}
| |
| <pre>
| |
| >>> from tensorflow import keras
| |
| >>> print(keras.__version__)
| |
| 2.10.0
| |
| </pre>
| |
|
| |
| To go further:
| |
| * [https://keras.io/getting_started/intro_to_keras_for_researchers/ Keras exemples]
| |
| * [https://blog.keras.io/ Keras blog]
| |
|
| |
| == Install HPC libraries ==
| |
| === GCC ===
| |
|
| |
| * Install the latest version of gcc via '''conda-forge''' channel
| |
| {{Term|location=inside|cmd=<code class="command">conda install -c conda-forge gcc_linux-64 gxx_linux-64</code>}}
| |
|
| |
| Note: do not forget to create a dedicated environment before.
| |
|
| |
| === OpenMPI ===
| |
|
| |
| Here's an example of installing Open MPI in a conda environment optimized with ucx to use the cluster's high-bandwidth and low-latency.
| |
|
| |
| UCX exposes a set of abstract communication primitives that utilize the best of available hardware resources and offloads. These include RDMA (InfiniBand and RoCE), TCP, GPUs, shared memory, and network atomic operations.
| |
|
| |
| ; Installation
| |
|
| |
| * Install OpenMPI, ucx, and GCC from '''conda-forge''' channel
| |
| {{Term|location=inside|cmd=<code class="command"> conda install -c conda-forge gcc_linux-64 openmpi ucx</code>}}
| |
| Note: do not forget to create a dedicated environment before.
| |
|
| |
| ; Test installation via NetPIPE
| |
|
| |
| * Install NetPIPE to test latency and network throughput (NetPIPE not available as conda package)
| |
| {{Term|location=fgrenoble|cmd=<code class="command">cd $HOME ; mkdir SRC && cd SRC</code>}}
| |
| {{Term|location=fgrenoble|cmd=<code class="command">wget https://src.fedoraproject.org/lookaside/pkgs/NetPIPE/NetPIPE-3.7.2.tar.gz/653071f785404bb68f8aaeff89fb1f33/NetPIPE-3.7.2.tar.gz</code>}}
| |
| {{Term|location=fgrenoble|cmd=<code class="command">tar zvxf NetPIPE-3.7.2.tar.gz</code>}}
| |
| {{Term|location=fgrenoble|cmd=<code class="command">cd NetPIPE-3.7.2/ && make mpi</code>}}
| |
|
| |
| * Reserve 2 cores on 2 separate nodes and enter into interactive session:
| |
| {{Term|location=fgrenoble|cmd=<code class="command">oarsub -I -p dahu -l /nodes=2/core=1</code>}}
| |
|
| |
| Note: choose an appropriate cluster with OminPath or InfiniBand Network connection to compare performance between two nodes using or not ucx driver. See [https://www.grid5000.fr/w/Hardware Grid'5000 Hardware Documentation].
| |
|
| |
| * On node dahu-X : Load conda, activate your conda environment, modify $PATH
| |
| {{Term|location=dahu|cmd=<code class="command">module load miniconda3 && eval "$(conda shell.bash hook)"</code>}}
| |
| {{Term|location=dahu|cmd=<code class="command">conda activate <env_name></code>}}
| |
| {{Term|location=dahu|cmd=<code class="command">export PATH=~/SRC/NetPIPE-3.7.2:$PATH</code>}}
| |
|
| |
| * Run MPI without ucx (use standard network):
| |
| {{Term|location=dahu|cmd=<code class="command">mpirun -np 2 --machinefile $OAR_NODEFILE --prefix $CONDA_PREFIX --mca plm_rsh_agent oarsh NPmpi</code>}}
| |
| <pre>
| |
| 0: dahu-3
| |
| 1: dahu-30
| |
| Now starting the main loop
| |
| 0: 1 bytes 6400 times --> 0.54 Mbps in 14.21 usec
| |
| 1: 2 bytes 7035 times --> 1.07 Mbps in 14.20 usec
| |
| 2: 3 bytes 7043 times --> 1.61 Mbps in 14.22 usec
| |
| ...
| |
| 116: 4194304 bytes 12 times --> 8207.76 Mbps in 3898.75 usec
| |
| 117: 4194307 bytes 12 times --> 8161.45 Mbps in 3920.87 usec
| |
| ...
| |
| </pre>
| |
|
| |
| * Run MPI with ucx (use rapid network):
| |
| {{Term|location=dahu|cmd=<code class="command">mpirun -np 2 --machinefile $OAR_NODEFILE --prefix $CONDA_PREFIX --mca plm_rsh_agent oarsh --mca pml ucx --mca osc ucx NPmpi</code>}}
| |
| <pre>
| |
| 0: dahu-3
| |
| 1: dahu-30
| |
| Now starting the main loop
| |
| 0: 1 bytes 19082 times --> 1.69 Mbps in 4.50 usec
| |
| 1: 2 bytes 22201 times --> 3.08 Mbps in 4.95 usec
| |
| 2: 3 bytes 20212 times --> 4.46 Mbps in 5.13 usec
| |
| ...
| |
| 116: 4194304 bytes 46 times --> 30015.10 Mbps in 1066.13 usec
| |
| 117: 4194307 bytes 46 times --> 30023.66 Mbps in 1065.83 usec
| |
| ...
| |
| </pre>
| |
|
| |
|
| = Use a Conda environment on Grid'5000 = | | = Use a Conda environment on Grid'5000 = |
Line 517: |
Line 224: |
| * Advantage : It avoids storing duplicate packages and makes any modification accessible to all users | | * Advantage : It avoids storing duplicate packages and makes any modification accessible to all users |
| * Inconvenient : Users could potentially harm the environment by installing or removing packages. | | * Inconvenient : Users could potentially harm the environment by installing or removing packages. |
|
| |
| == Create a specific Conda environment for PowerPC architecture ==
| |
|
| |
| IBM PowerAI provides a Conda channel with dedicated packages compiled for ppc64le: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/#/
| |
|
| |
| * Install a package from IBM PowerAI
| |
| {{Term|location=inside|cmd=<code class="command">conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ <package></code>}}
| |
| Note: do not forget to create a dedicated environment before.
| |
|
| |
| * Some packages in PowerAI might require older dependencies. For instance, the version of PyTorch is too old for Python 3.8 or Python 3.9, we must use Python 3.7:
| |
| {{Term|location=inside|cmd=<code class="command">conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ pytorch python=3.7</code>}}
| |
|
| |
|
| = Mamba as an alternative to Conda = | | = Mamba as an alternative to Conda = |
Line 546: |
Line 242: |
| To go further: | | To go further: |
| * [https://mamba.readthedocs.io/en/latest/installation.html Mamba installation] | | * [https://mamba.readthedocs.io/en/latest/installation.html Mamba installation] |
| | |
| | = Build your HPC-IA framework with conda = |
| | |
| | Here are some pointers to help you set up your software environment for HPC or AI with conda |
| | * [[HPC_and_HTC_tutorial]] |
| | * Running [[Run_MPI_On_Grid'5000|MPI applications on Grid'5000]] |
| | * [[Deep_Learning_Frameworks|Deep Learning Frameworks documentation]] |
|
Note
|
This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.
|
|
Note
|
This document was written by consolidating the following different information resources:
|
Introduction
Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.
The conda package and environment manager is included in all versions of Anaconda®, Miniconda, and Anaconda Repository. Conda is also available on conda-forge, a community channel.
Anaconda or Miniconda?
Anaconda contains a full distribution of packages while Miniconda is a condensed version that contains the essentials for standard purposes.
References
Conda usage
Conda shell activation
Conda shell activation is the process of defining some shell functions that facilitate activating and deactivating Conda environments, as well as some optional features such as updating PS1 to show the active environment.
The conda shell function is mainly a forwarder function. It will delegate most of the commands to the real conda executable driven by the Python library.
Activate your Conda shell environment as follow:
|
$ :
|
eval "$(conda shell.bash hook)"
|
By defaut, you are located in the base
Conda environment that correspond to the base installation of Conda.
Conda environments
Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments.
When you begin using conda, you already have a default environment named "base".
You can create separate environments to keep your programs isolated from each other. Specifying the environment name confines conda commands to that environment.
- List all your environments
|
$ :
|
conda info --envs
|
or
|
$ :
|
conda env list
|
|
$ :
|
conda create --name ENVNAME
|
- Activate this environment before installing package
|
$ :
|
conda activate ENVNAME
|
For further information:
Conda package installation
In its default configuration, Conda can install and manage the over 7,500 packages at https://repo.anaconda.com/pkgs/ that are built, reviewed, and maintained by Anaconda. This is the default Conda channel which may require a paid license, as described in the repository terms of service a commercial license.
|
$ :
|
conda install <package>
|
- Install specific version of package:
|
$ :
|
conda install <package>=<version>
|
|
$ :
|
conda uninstall <package>
|
For more information:
Conda package installation from channels
Channels are the locations of the repositories where Conda looks for packages. Channels may point to a Cloud repository or a private location on a remote or local repository that you or your organization created. Useful channels are:
To install a package from a specific channel:
|
$ :
|
conda install -c <chanel_name> <package>
|
- List all packages installed with their source channels
|
$ :
|
conda list --show-channel-urls
|
For more information:
Suggested reading
Load conda on Grid'5000
Conda is already available in Grid'5000 as a module. You don't need to install Anaconda or Miniconda on Grid'5000! To make it available on a node or on a frontend, you need to load the Conda module as follow:
|
fsophia :
|
module load miniconda3
|
|
fsophia :
|
conda --version
|
conda 22.11.1
|
node :
|
module load anaconda3
|
|
node :
|
conda --version
|
conda 4.12.0
By default, Conda and all packages are installed locally with a user-specific configuration. In the Grid'5000 context, Conda comes with some pre-existing packages in the base
environment.
- To list the emplacement of Conda and the current environment:
|
fsophia :
|
conda info
|
active environment : base
active env location : /grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg
...
base environment : /grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg (read only)
...
envs directories : /home/lmirtain/.conda/envs
/grid5000/spack/opt/spack/linux-debian11-x86_64/gcc-10.2.0/miniconda3-4.10.3-x6kxdkqihyhysyjs7i4g77wururhgvfg/envs
...
You can see here that the active environment is base
and its emplacement is on a NFS storage (/grid5000/....
).
- To list installed packages in the current environment:
|
fsophia :
|
conda list
|
Create conda environments on Grid'5000
Basic Conda workflow
|
Warning
|
Installing Conda packages can be time and resource consuming. Preferably use a node (instead of a frontend) to perform such an operation. Note, using a node is mandatory if you need to access specific hardware resources like GPU.
|
- Load conda module and activate bash completion
|
fgrenoble :
|
module load miniconda3
|
|
fgrenoble :
|
eval "$(conda shell.bash hook)"
|
- Create an environment (specify a Python version; otherwise, it is the module default version)
|
fgrenoble :
|
conda create -y -n <name> python=x.y
|
|
fgrenoble :
|
conda activate <name>
|
|
fgrenoble :
|
conda install <package_name>
|
- Exit from the loaded environment
|
fgrenoble :
|
conda deactivate
|
Remove unused Conda environments
|
Warning
|
Conda packages are installed in $HOME/.conda . You could, therefore, rapidly saturate your homedir quota (25GB by default). Do not forget to occasionally remove unused Conda environment to free up space.
|
|
fgrenoble :
|
conda env remove --name <name>
|
- To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local).
|
fgrenoble :
|
conda clean -a
|
Use a Conda environment on Grid'5000
As seen in the previous section, the Conda environment is stored by default in user's homedir (at ~/.conda
). Once the environment is created and packages installed, it is usable on all nodes from the given site.
For interactive jobs
|
fgrenoble :
|
oarsub -I
|
|
node :
|
module load miniconda3
|
|
node :
|
eval "$(conda shell.bash hook)"
|
|
node :
|
conda activate <name>
|
For batch jobs
|
Warning
|
As module command is not a real executable but a shell function, it must be executed in an actual shell to work. A simple oarsub "module load miniconda3" will fail.
|
|
fgrenoble :
|
oarsub 'bash -l -c "module load miniconda3; conda activate <name>; <your script>"'
|
Advanced Conda environment operations
Synchronize Conda environments between Grid'5000 sites
- To synchronize a Conda directory from a siteA to a siteB:
|
fsiteA :
|
rsync --dry-run --delete -avz ~/.conda siteB.grid5000.fr:~
|
To really do things, the --dry-run
argument has to be removed and siteB has to be replaced by a real site name.
Share Conda environments between multiple users
You can use two different approaches to share Conda environments with other users.
Export an environment as a yaml file
|
fgrenoble :
|
conda env export > environment.yml
|
- Share it by putting the yaml file in your public folder
|
fgrenoble :
|
cp environment.yml ~/public/
|
- Other users can create the environment from the
environment.yml
file
|
fgrenoble :
|
conda env create -f ~/<login>/public/environment.yml
|
- Advantage : it prevents other users from damaging the environment if they add packages that could conflict with other packages and/or even delete packages that another user might need.
- Inconvenient : it's not a true shared environment. The environment is duplicated on other users' home directory. Any modification on one Conda environment will not be automatically replicated on others.
Use a group storage
Group Storage gives you the possibility to share a storage between multiple users. You can take advantage of a group storage to share a single Conda environment among multiple users.
- Create a shared Conda environment (
--prefix
allows you to specify the path to store the conda environment)
|
flyon :
|
conda create --prefix /srv/storage/ storage_name @server_hostname_(fqdn) /ENVNAME
|
- Activate the shared environment (share this command with the targeted users)
|
flyon :
|
conda activate /srv/storage/ storage_name @server_hostname_(fqdn) /ENVNAME
|
- Advantage : It avoids storing duplicate packages and makes any modification accessible to all users
- Inconvenient : Users could potentially harm the environment by installing or removing packages.
Mamba as an alternative to Conda
mamba is a reimplementation of the conda package manager in C++. Mamba is fully compatible with Conda packages and supports most of Conda's commands. It consists of:
- mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions
- micromamba: a pure C++-based CLI, self-contained in a single-file executable
- libmamba: a C++ library exposing low-level and high-level APIs on top of which both mamba and micromamba are built
Mamba is relatively new and unpopular compared to Conda. That means there are probably more undiscovered bugs, and that new bugs may take longer to be discovered. mamba has to be considerate when using a devops chain in order to test and deploy an environment (i.e., docker images) with continuous integration pipelines. Conda has a reputation for taking time when dealing with complex sets of dependencies so CI jobs can take longer than they need to.
- Mamba installation when already have Conda
|
inside :
|
conda install mamba -c conda-forge
|
- Installing packages is similarly easy, example:
|
inside :
|
mamba install python=3.8 jupyter -c conda-forge
|
To go further:
Build your HPC-IA framework with conda
Here are some pointers to help you set up your software environment for HPC or AI with conda