Grid5000:Software: Difference between revisions

From Grid5000
Jump to navigation Jump to search
m (kavlan availability)
(EnOSlib)
 
(34 intermediate revisions by 12 users not shown)
Line 1: Line 1:
{{Maintainer|Daniel Balouek}}
{{Maintainer|Daniel Balouek}}
{{Status|Draft}}
{{Status|Draft}}
{{Portal|User}}


Software mainly developed in Grid'5000 and available for its users.
__NOEDITSECTION__
__NOEDITSECTION__
__TOC__
__TOC__


=System Software=
This page lists software developed by the Grid'5000 community or the Grid'5000 team.


===Grid5000 Team===
= Experiment management tools =
{{Template:Software|
== Execo: unix processes orchestration, and experiment orchestration ==
|title=[[OAR2|OAR 2]]
Execo offers a Python API for asynchronous control of local or remote, standalone or parallel, unix processes. It is especially well suited for quickly and easily scripting workflows of parallel/distributed operations on local or remote hosts: automate a scientific workflow, conduct computer science experiments, perform automated tests, etc. The core python package is <code>execo</code>. The <code>execo_g5k</code> package provides a set of tools and extensions for the Grid5000 testbed. The <code>execo_engine</code> package provides tools to ease the development of computer sciences experiments.
|purpose=OAR is a resource manager (or batch scheduler) for large clusters. It allows cluster users to submit or reserve nodes either in an interactive or a batch mode.
|logo=[[Image:Logo_oar.png|150px|right]]
|contacts=[mailto:olivier_DOT_richard_AT_imag_DOT_fr Olivier Richard]
|status=Production/Stable
|homepage=http://oar.imag.fr
|availability=OAR 2 is installed on all Grid'5000 clusters
}}


{{Template:Software|
The [https://mimbert.gitlabpages.inria.fr/execo/readme.html README] and [https://mimbert.gitlabpages.inria.fr/execo/userguide.html User Guide] are the best entry points to the execo documentation. The [http://execo.gforge.inria.fr/doc/latest-stable/userguide.html User Guide] shows with simple and step by step examples how to benefit from execo for experiment scripting on Grid'5000.
|title=[[Kadeploy|Kadeploy 3]]
|purpose=Kadeploy is a fast and scalable deployment system towards cluster and grid computing. It provides a set of tools, for cloning, configuring (post installation) and managing a set of nodes. Currently it deploys successfully linux, *BSD, Windows, Solaris on x86 and 64 bits computers.
|logo=[[Image:Kadeploy.png|200px|right]]
|contacts=[mailto:emmanuel_DOT_jeanvoine_AT_inria_DOT_fr Emmanuel Jeanvoine]
|status=Production/Stable
|homepage=https://gforge.inria.fr/projects/kadeploy3/
|availability=Kadeploy is installed on all Grid'5000 clusters
}}


{{Template:Software|
* '''Maintainer:''' matthieu dot imbert at inria dot fr
|title=[[KaVLAN]]
* '''Website:''' https://gitlab.inria.fr/mimbert/execo
|purpose=VLAN manipulation tool for network isolation of experiment
|logo=
|contacts=[mailto:nicolas_DOT_niclausse_AT_inria_DOT_fr Nicolas Niclausse]
|status=Production
|homepage=http://kavlan.gforge.inria.fr/
|availability= See status table on [[KaVLAN]] page
}}


===Grid5000 Community===
== Funk: (F)ind yo(U)r (N)odes on g5(K) ==


{{Template:Software|
Funk is a tool that help you to plan your experiments on Grid'5000. It is a Python script based on Execo that can give you nodes available at a given time, find available free slots for a combination of resources, etc.
|title=[[TakTuk|TakTuk 3]]
|purpose=TakTuk is a tool for deploying parallel remote executions of commands to a potentially large set of remote nodes. It spreads itself using an adaptive algorithm and sets up an interconnection network to transport commands and perform I/Os multiplexing/demultiplexing. The TakTuk mechanics dynamically adapts to environment (machine performance and current load, network contention) by using a reactive work-stealing algorithm that mix local parallelization and work distribution.
|logo=[[Image:TakTuk.png|200px|right]]
|contacts=[mailto:guillaume_DOT_huard_AT_imag_DOT_fr Guillaume Huard]
|status=Production/Stable
|homepage=http://taktuk.gforge.inria.fr/
|availability=-
}}


{{Template:Software|
* '''Maintainer:''' Matthieu Imbert and Simon Delamare (''firstname.lastname@inria.fr'')
|title=Katapult 3
* '''Website:''' https://github.com/lpouillo/Funk
|purpose='''Katapult is a small, well-tested script to automatically start experiments using deployments.''' Most experiments start by deploying the nodes, re-deploying the nodes if too many of them failed, copying the user's SSH key to the node, etc. Katapult automates all those tasks. This tool is available on most clusters under the name katapult3 and is compliant  with kadeploy3.
|logo=
|contacts=[mailto:lucas.nussbaum_AT_imag_DOT_fr Lucas Nussbaum]
|status=Production/Stable
|homepage=http://www.loria.fr/~lnussbau/katapult.html
|availability=tested on most clusters under the name katapult3.
}}


{{Template:Software|
== PAR: a PARallel and distributed job crusher ==
|title=[[GRUDU|GRUDU  1.1.0]]
|purpose='''GRUDU''' for '''''G'''rid5000 '''R'''eservation '''U'''tility for '''D'''eployment '''U'''sage'' is a tool for managing Grid5000 resources, reservations and deployments. It provides :
# the access to the Grid5000 grid environment through a GUI interface allowing the user to get detailed informations on the grid, the sites, and the present jobs.
# the reservation of resources on all queues/types thanks to an user-friendly interface
# the deployment of images on the nodes reserved with the deploy type (through Kadeploy)
# the Ganglia monitoring for all nodes of a sites and the history of the Ganglia metrics on jobs' nodes
# file transfert and synnchrinisation capability with the JFTP module
# the acces to a terminal on each level of the grid (the connection frontale, a cluster frontale or the master node of a reservation)
As GRUDU is a JAVA-based and client-side software it is cross-platform and you only need a JVM 1.5.0 or later on your computer to run it.
|logo=[[Image:GRUDU.png|250px|right]]
|contacts=[mailto:david_DOT_loureiro_AT_ens-lyon_DOT_fr David Loureiro]
|status=Production/Stable
|homepage=http://graal.ens-lyon.fr/DIET/grudu.html
|availability=Portable (written in Java)
}}


{{Template:Software|
PAR is a scalable, dynamic, parallel and distributed execution engine for Bag-of-Tasks. PAR is aimed at multi-core architectures and small clusters. PAR is production ready and should be faster than GNU parallel.
|title=Adage
|purpose=ADAGE is an automatic deployment tool of applications in a grid environnement. It targets dynamic applications by providing a large set of services to deal with resources like information services, resources reservation/allocation, file transfert and job launching anfd monitoring.


ADAGE is internally based on a generic application description model (GADe) so as to support any kind of programming model and/or to support multi-programming model applications
* '''Maintainer:''' Francois Berenger (RIKEN) and Camille Coti (LIPN - Univ. Paris 13)
|logo=
* '''Website:''' http://savannah.nongnu.org/projects/par
|contacts=[mailto:christian_DOT_perez_AT_inria_DOT_fr Christian Perez]
|status=Production/Stable
|homepage=http://adage.gforge.inria.fr/
|availability=should work on all clusters.has been tested on various Debian versions, on RH/Fedora-likes, and on *BSD
}}


=Experiment Tools=
== Ruby-Cute: Ruby gem for Grid'5000 ==
Ruby-Cute is a set of ''Commonly Used Tools for Experiments'', or ''Critically Useful Tools for Experiments''. It is a Ruby library (gem) aggregating various snippets useful in the context of  development of experiment control scripts on Grid'5000. It includes code for interaction with the Grid'5000 REST API, parallel SSH execution, etc.


{{Template:Software|
Of particular interest is the [http://www.rubydoc.info/github/ruby-cute/ruby-cute/master/file/examples/xp-bootstrap xp-bootstrap example] that shows how to automate the various steps usually performed at the beginning of an experiment (resources reservation, deployment, configuration).
|title=Wrekavoc
|logo=
|purpose=The goal of Wrekavoc is to define and control the heterogeneity of a given platform by degrading CPU, network or memory capabilities of each node composing this platform. The degradation is done remotely, without restarting the hardware. The control is fine, reproducible and independent (one may degrade CPU without modifying the network bandwidth).
|contacts=[mailto:Emmanuel_DOT_Jeannot_AT_loria_DOT_fr Emmanuel Jeannot]
|status=Production/Stable
|homepage=http://wrekavoc.gforge.inria.fr
|availability=Wrekavoc has been succesfully tested on several linux clusters
}}


{{Template:Software|
* '''Maintainers:''' [https://members.loria.fr/lnussbaum/ Lucas Nussbaum]
|title=CoRDAGe
* '''Website:'''  https://github.com/ruby-cute/ruby-cute
|logo=[[Image:Logo_CoRDAGe.jpg|150px|right]]
|purpose=Co-deployment and Re-deployment of Generic Applications.
|contacts=Loïc Cudennec
|status=Beta
|homepage=http://cordage.gforge.inria.fr
|availability=Grid'5000
}}


{{Template:Software|
== grd: modern command-line utility for typical Grid'5000 workflows ==
|title='''NXE'''
|logo=
|purpose='''N'''etwork e'''X'''periment '''E'''ngine is a tool written in Python to automate networking experiments in real testbeds environments. It allows to simply script experiments involving hundreds of nodes.
The scenarios are described through XML files that provides a simple and hierarchical description of the topology, the general configuration and the interactions between the end-hosts.
|contacts=Romaric Guillier
|status=Stable
|homepage= http://ens-lyon.fr/LIP/RESO/Software/NXE
|availability=requires Paramiko SSH library (available on compil.lyon.grid5000.fr)
}}


'''grd''' is a command line utility, included in [https://github.com/ruby-cute/ruby-cute Ruby-Cute], that provides a
modern interface to typical Grid'5000 workflows. As an example, <code>grd bootstrap -s
ly -l nodes=3 -w 0:10 -e debian11-x64-min -f setup-script -c</code> will reserve 3
nodes on the ''lyon'' site, provision the ''debian11-x64-min'' environment, copy and run
''setup-script'' on the first node, and then connect interactively using SSH.


{{Template:Software|
For more information about how to use '''grd''', run <code>grd --help</code> and <code>grd
|title=Execo
bootstrap --help</code> on any frontend. '''grd''' can also work from your own computer
|purpose=Execo is an experiment toolkit. It offers a Python API for local or remote, standalone or parallel, processes execution. It is especially well suited for quickly scripting workflows of parallel/distributed operations on local or remote hosts: conducting experiments, performing automated tests, etc. It includes an API for dealing with oar, oargrid, kadeploy, and grid5000 API.
(outside Grid'5000) after installing Ruby-Cute.
|contacts=matthieu dot imbert at inria dot fr
|logo=[[Image:Execo.png|right]]
|status=Version 1.2. Stable
|homepage=http://execo.gforge.inria.fr
|availability=Should work on all clusters, needs to be installed.  
}}


=Development Environments / Middleware=
* '''Maintainers:''' [https://members.loria.fr/lnussbaum/ Lucas Nussbaum]
* '''Website:'''  https://github.com/ruby-cute/ruby-cute


== TakTuk: parallel launcher ==


{{Template:Software|
TakTuk is a tool for deploying parallel remote executions of commands to a potentially large set of remote nodes.
|title=Kaapi
|purpose=KAAPI means Kernel for Adaptative, Asynchronous Parallel and Interactive
programming. It is a C++ library that allows to execute multithreaded
computation with data flow synchronization between threads. The library is
able to schedule fine/medium size grain program on distributed machine. The
data flow graph is dynamic (unfold at runtime).Target architectures are
clusters of SMP machines.
|contacts=[mailto:kaapi-dev-info_AT_lists_DOT_gforge_DOT_inria_DOT_fr kaapi-dev-info@lists.gforge.inria.fr], [mailto:Thierry_DOT_Gautier_AT_imag_DOT_fr Thierry Gautier]
|logo=
|status=Release versions are stable
|homepage= http://kaapi.gforge.inria.fr
|availability= Should work on all clusters
}}


{{Template:Software|
TakTuk is used both internally for the infrastructure management and by end users, to manage large parallel remote executions on Grid'5000.
|title=DIET
|purpose=DIET means Distributed Interactive Engineering Toolbox. It is a C/C++ grid middleware, based on the GridRPC paradigm. DIET provides lots of mechanisms for simplifying the usage of a grid: adaptable job scheduling, performance prediction, data management (replication, persistency...), workflow management (either dataflow and workflows with conditional structures and loops), transparent job submission to batch schedulers (OAR, PBS, SGE, LoadLeveler...), transparent submission to cloud systems (Eucalyptus, Amazon EC2...).
|contacts=[mailto:diet-dev_AT_ens-lyon_DOT_fr diet-dev@ens-lyon.fr], [mailto:Eddy_DOT_Caron_AT_ens-lyon_DOT_fr Eddy Caron]
|logo=[http://graal.ens-lyon.fr/DIET/images/logo_diet.png]
|status=Release versions are stable. Current version: 2.6.1
|homepage= http://graal.ens-lyon.fr/DIET
|availability= Works on all clusters
}}


{{Template:Software|
* '''Maintainer:''' Polaris Team at LIG / Inria Grenoble Rhone-Alpes (Contact: Guillaume Huard)
|title=Marcel
* '''Website:''' https://taktuk.gitlabpages.inria.fr
|purpose= Marcel is a POSIX-compliant thread library featuring a programmable scheduler designed for hierarchical multiprocessor architectures.
|contacts=[mailto:samuel_DOT_thibault_AT_labri_DOT_fr Samuel Thibault]
|logo=
|status= Production
|homepage=http://runtime.futurs.inria.fr/marcel/
|availability=Bordeaux
}}


{{Template:Software|
== Python-grid5000 ==
|title=Mad-MPI
|purpose= Mad-MPI is an efficient implementation of MPI for fast networks.
|contacts=[mailto:nathalie_DOT_furmento_AT_labri_DOT_fr Nathalie Furmento]
|logo=
|status= Production
|homepage=http://runtime.futurs.inria.fr/MadMPI/
|availability=Bordeaux
}}


{{Template:Software|
Python-grid5000 is a thin wrapper around the Grid'5000 REST API. It's a python library (pip).  
|title=MPICH-Madeleine
It exposes the main Grid'5000 resources (jobs, deployments, storage, vlans ...) and allow their modifications in a friendly manner.
|purpose= MPICH-Madeleine is an MPI implementation for clusters and clusters of clusters with heterogenous networks.
|contacts=[mailto:nathalie_DOT_furmento_AT_labri_DOT_fr Nathalie Furmento], [mailto:guillaume_DOT_mercier_AT_labri_DOT_fr Guillaume Mercier]
|logo=
|status= Production
|homepage=http://runtime.futurs.inria.fr/mpi/
|availability=Bordeaux
}}


{{Template:Software|
* '''Maintainer:''' Matthieu Simonin
|title=NewMadeleine
* '''Website:''' https://gitlab.inria.fr/msimonin/python-grid5000
|purpose= the NewMadeleine communication library provides extended capabilities for dynamic communication optimization on top of high performance networks.
 
|contacts=[mailto:olivier_DOT_aumage_AT_labri_DOT_fr Olivier Aumage], [mailto:elisabeth_DOT_brunet_AT_labri_DOT_fr Elisabeth Brunet]
== EnOSlib ==
|logo=
 
|status= Production
EnOSlib helps you deploy applications on various platforms. It targets role based deployments and allows you to describe your operations using tasks.
|homepage=http://runtime.futurs.inria.fr/newmadeleine/
More pragmatically, with the EnOSlib, you can iterate on your application deployment and experimental workflow locally before moving to a large testbed like Grid‘5000, or Chameleon. It saves time and energy.
|availability=Bordeaux
 
}}
* '''Maintainers:''' Matthieu Simonin, Baptiste Jonglez
* '''Website:''' https://discovery.gitlabpages.inria.fr/enoslib/
 
== Restfully ==
 
Restfully is a general-purpose client library for RESTful APIs, written in Ruby. Its goal is to abstract the nitty-gritty details of exchanging HTTP requests between the user-agent and the server. In the context of Grid5000, it is particularly suited to exploring the reference API, and also works with the jobs and deployments APIs.
 
* '''Maintainer:''' Cyril Rohr and Grid'5000 team
* '''Website:''' https://rubygems.org/gems/restfully/ and https://github.com/grid5000/restfully
 
== Grid5000 Terraform provider ==
 
[https://www.terraform.io Terraform] is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files. The Grid5000 provider allow to interact with OAR jobs, Kadeploy bare-metal deployments and Ceph pools. Combined with [https://registry.terraform.io/providers/rancher/rke/latest RKE provider] you will be able to deploy Kubernetes clusters and complex software stacks, see [https://github.com/pmorillon/terraform-provider-grid5000/tree/master/examples tutorials].
 
* '''Maintainer:''' Pascal Morillon (''firstname.lastname@irisa.fr'')
* '''Website:''' https://registry.terraform.io/providers/pmorillon/grid5000/latest
* '''Tutorials:''' https://github.com/pmorillon/terraform-provider-grid5000/tree/master/examples
 
== Dask-jobqueue ==
 
[https://jobqueue.dask.org/en/latest/index.html Dask-jobqueue] is a Python library which makes it easy to deploy [https://www.dask.org/ Dask] on common job queuing systems typically found in high performance supercomputers, academic research institutions, and other clusters. Since Dask-jobqueue provides interfaces for OAR and Slurm-based clusters, it can be used to facilitate the switch between OAR and Slurm based resource managers.
 
* '''Maintainer:''' Dask-jobqueue Development Team
* '''Website:''' https://jobqueue.dask.org/en/latest/
* '''Tutorials:''' https://www.grid5000.fr/w/Dask-jobqueue
 
= Drivers for virtualization and containers solutions =
 
== docker-machine-driver-g5k: using a Grid'5000 physical machine directly from Docker ==
 
'''docker-machine-driver-g5k''' is a DockerMachine driver that leverages the provisioning of a Grid'5000 node as a machine in the context of Docker.
 
* '''Maintainers:''' Spirals team at CRIStAL / Inria Lille - Nord Europe (Contact: [http://romain.rouvoy.fr Romain Rouvoy])
* '''Website:'''  https://github.com/Spirals-Team/docker-machine-driver-g5k
 
== docker-g5k: provisioning a Docker cluster within Grid'5000  ==
 
'''docker-g5k''' is a command line interface (CLI) and a Go library to provision a cluster of docker hosts made of multiple Grid5000 nodes, possibly provisioned across several sites.
 
* '''Maintainers:''' Spirals team at CRIStAL / Inria Lille - Nord Europe (Contact: [http://romain.rouvoy.fr Romain Rouvoy])
* '''Website:'''  https://github.com/Spirals-Team/docker-g5k
 
== vagrant-grid5000: using Grid'5000 physical machines directly from Vagrant ==
 
'''vagrant-grid5000''' is a Vagrant Provider Plugin that makes it possible to use Grid'5000 nodes as machines in the context of Vagrant. When ''vagrant up'' is called, a node is reserved and deployed (by default, on the ''nancy'' site, and with the ''jessie-x64-min'' environment), and then provisioned by Vagrant.
 
* '''Maintainers:''' MADYNES team at LORIA / Inria Nancy - Grand Est (Contact: [https://members.loria.fr/lnussbaum/ Lucas Nussbaum])
* '''Website:'''  https://github.com/lnussbaum/vagrant-grid5000
 
== vagrant-g5k: manage virtual machines on Grid'5000 using vagrant ==
 
'''vagrant-g5k''' is a vagrant provider to manage virtual machines on Grid'5000. It supports different kinds of networking (Nat, bridge), disk backends (local, RBD, snapshots).
Usual vagrant commands are supported (provision, rsync, ssh ...).
 
* '''Maintainers:''' Matthieu Simonin (Contact: [http://mailto:matthieu.simonin@inria.fr matthieu.simonin@inria.fr])
* '''Website:'''  https://github.com/msimonin/vagrant-g5k
 
= Deployment of complex software stacks inside Grid'5000 =
 
== benchmark-containers: leveraging the deployment of standard benchmarks ==
 
'''benchmark-containers''' shares a repository of state-of-the-art benchmarks (Parsec, IOzone, etc.) packaged as Docker containers.
 
* '''Maintainers:''' Spirals team at CRIStAL / Inria Lille - Nord Europe (Contact: [http://romain.rouvoy.fr Romain Rouvoy])
* '''Website:'''  https://github.com/Spirals-Team/benchmark-containers / https://hub.docker.com/r/spirals
 
== EnOS : Experimental environment for OpenStack ==
 
Enos deploys OpenStack and targets reproducible experiments. With Enos you can deploy, customize, benchmark OpenStack.
 
* '''Maintainers:''' Discovery IPL (Contacts: ronan-alexandre.cherrueau@inria.fr / matthieu.simonin@inria.fr)
* '''Website:''' https://github.com/beyondtheclouds/enos
 
== hadoop-benchmark: leveraging the deployment of Vanilla Hadoop ==
 
'''hadoop-benchmark''' provides Docker containers to build an Hadoop infrastructure and experiment feedback control loops atop of it.
 
* '''Maintainers:''' Spirals team at CRIStAL / Inria Lille - Nord Europe (Contact: [http://romain.rouvoy.fr Romain Rouvoy])
* '''Website:'''  https://github.com/Spirals-Team/hadoop-benchmark / https://hub.docker.com/r/spirals
 
= Emulation tools =
== Distem: CPU performance and network emulator ==
 
Distem is a distributed systems emulator. It can be used to transform an homogeneous Grid'5000 cluster (composed of identical nodes) into an experimental platform where nodes have different performance, and are linked together through a complex network topology, making it the ideal tool to benchmark applications targeting such environments.
 
* '''Maintainers:''' RESIST team at LORIA / Inria Nancy - Grand Est (Contact: [https://members.loria.fr/lnussbaum/ Lucas Nussbaum])
* '''Website:'''  https://distem.gitlabpages.inria.fr/
 
= Monitoring software =
 
== PowerAPI : monitoring the power consumption of processes ==
 
'''PowerAPI''' is a middleware toolkit for building software-defined power meters that can monitor the power consumption of software processes and containers across several Grid'5000 nodes.
 
* '''Maintainers:''' Spirals team at CRIStAL / Inria Lille - Nord Europe (Contact: [http://romain.rouvoy.fr Romain Rouvoy])
* '''Website:'''  http://powerapi.org / https://hub.docker.com/r/spirals
 
 
= Infrastructure software =
This section lists software that is used to run the Grid'5000 infrastructure.
 
== Kadeploy: bare-metal cluster provisioning solution ==
 
Kadeploy is a scalable, efficient and reliable deployment system (cluster provisioning solution) for cluster computing. It provides a set of tools for cloning, configuring (post installation) and managing cluster nodes. It can deploy a 300-nodes cluster in a few minutes, without intervention from the system administrator. It can deploy Linux, *BSD, Windows, Solaris.
 
On Grid'5000, it allows users to re-install nodes and get administrator access on them for the duration of their resources reservation.
 
* '''Maintainers:''' MADYNES team at LORIA / Inria Nancy - Grand Est (Contact: [https://members.loria.fr/lnussbaum/ Lucas Nussbaum]). Earlier versions were developed by the LIG laboratory in Grenoble.
* '''Website:'''  https://kadeploy.gitlabpages.inria.fr/
 
== Kameleon: a reproducible appliance builder ==
Kameleon is a simple but powerful tool to generate customized system appliances.
 
Kameleon is used in the process of generating Grid'5000 system environments, which users deploy with Kadeploy, from scripted recipes.
 
* '''Maintainers:''' DataMove team at LIG / Inria Grenoble Rhone-Alpes (Contact: Pierre Neyron, Michael Mercier)
* '''Website:'''  http://kameleon.imag.fr
 
== KaVlan: network reconfiguration ==
KaVLAN provides network isolation capabilities for Grid'5000 users' experimentations, via a high-level, user-driven interface to VLANs (802.1Q).
 
* '''Maintainers:''' Nicolas Niclausse (''firstname.lastname@inria.fr'')
* '''Website:''' https://www.grid5000.fr/mediawiki/index.php/KaVLAN
 
== Kwollect :  collecting metrics of IT infrastructures ==
Kwollect is a framework for collecting metrics of IT infrastructures (energy, performance, environmental, ...) and make them available to users.
 
It targets high frequency collection with lossless & long term storage of metrics and focuses on environmental metrics.
 
* '''Maintainers:''' Grid'5000 Team (''support-staff at lists.grid5000.fr'')
* '''Website:''' https://gitlab.inria.fr/grid5000/kwollect
 
== OAR2: resource and job management system ==
OAR is a versatile resource and task manager (also called a batch scheduler) for HPC clusters, and other computing infrastructures (like distributed computing experimental testbeds where versatility is a key).
 
OAR is the low level resources reservation tool managing most of Grid'5000 resources, which users use directly or through the Grid'5000 API.
 
* '''Maintainers:''' DataMove team at LIG / Inria Grenoble Rhone-Alpes (Contact: [http://oar.imag.fr/contacts OAR team]).
* '''Website:'''  http://oar.imag.fr




{{Template:Help|text=Grid'5000 users, please send an email to [mailto:web-staff_AT_lists_DOT_grid5000_DOT_fr web-staff] if you would like to see a new software appear on this page.}}
{{Template:Help|text=Grid'5000 users, please send an email to [mailto:web-staff_AT_lists_DOT_grid5000_DOT_fr web-staff] if you would like to see a new software appear on this page.}}

Latest revision as of 17:49, 12 June 2024


This page lists software developed by the Grid'5000 community or the Grid'5000 team.

Experiment management tools

Execo: unix processes orchestration, and experiment orchestration

Execo offers a Python API for asynchronous control of local or remote, standalone or parallel, unix processes. It is especially well suited for quickly and easily scripting workflows of parallel/distributed operations on local or remote hosts: automate a scientific workflow, conduct computer science experiments, perform automated tests, etc. The core python package is execo. The execo_g5k package provides a set of tools and extensions for the Grid5000 testbed. The execo_engine package provides tools to ease the development of computer sciences experiments.

The README and User Guide are the best entry points to the execo documentation. The User Guide shows with simple and step by step examples how to benefit from execo for experiment scripting on Grid'5000.

Funk: (F)ind yo(U)r (N)odes on g5(K)

Funk is a tool that help you to plan your experiments on Grid'5000. It is a Python script based on Execo that can give you nodes available at a given time, find available free slots for a combination of resources, etc.

PAR: a PARallel and distributed job crusher

PAR is a scalable, dynamic, parallel and distributed execution engine for Bag-of-Tasks. PAR is aimed at multi-core architectures and small clusters. PAR is production ready and should be faster than GNU parallel.

Ruby-Cute: Ruby gem for Grid'5000

Ruby-Cute is a set of Commonly Used Tools for Experiments, or Critically Useful Tools for Experiments. It is a Ruby library (gem) aggregating various snippets useful in the context of development of experiment control scripts on Grid'5000. It includes code for interaction with the Grid'5000 REST API, parallel SSH execution, etc.

Of particular interest is the xp-bootstrap example that shows how to automate the various steps usually performed at the beginning of an experiment (resources reservation, deployment, configuration).

grd: modern command-line utility for typical Grid'5000 workflows

grd is a command line utility, included in Ruby-Cute, that provides a modern interface to typical Grid'5000 workflows. As an example, grd bootstrap -s ly -l nodes=3 -w 0:10 -e debian11-x64-min -f setup-script -c will reserve 3 nodes on the lyon site, provision the debian11-x64-min environment, copy and run setup-script on the first node, and then connect interactively using SSH.

For more information about how to use grd, run grd --help and grd bootstrap --help on any frontend. grd can also work from your own computer (outside Grid'5000) after installing Ruby-Cute.

TakTuk: parallel launcher

TakTuk is a tool for deploying parallel remote executions of commands to a potentially large set of remote nodes.

TakTuk is used both internally for the infrastructure management and by end users, to manage large parallel remote executions on Grid'5000.

Python-grid5000

Python-grid5000 is a thin wrapper around the Grid'5000 REST API. It's a python library (pip). It exposes the main Grid'5000 resources (jobs, deployments, storage, vlans ...) and allow their modifications in a friendly manner.

EnOSlib

EnOSlib helps you deploy applications on various platforms. It targets role based deployments and allows you to describe your operations using tasks. More pragmatically, with the EnOSlib, you can iterate on your application deployment and experimental workflow locally before moving to a large testbed like Grid‘5000, or Chameleon. It saves time and energy.

Restfully

Restfully is a general-purpose client library for RESTful APIs, written in Ruby. Its goal is to abstract the nitty-gritty details of exchanging HTTP requests between the user-agent and the server. In the context of Grid5000, it is particularly suited to exploring the reference API, and also works with the jobs and deployments APIs.

Grid5000 Terraform provider

Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files. The Grid5000 provider allow to interact with OAR jobs, Kadeploy bare-metal deployments and Ceph pools. Combined with RKE provider you will be able to deploy Kubernetes clusters and complex software stacks, see tutorials.

Dask-jobqueue

Dask-jobqueue is a Python library which makes it easy to deploy Dask on common job queuing systems typically found in high performance supercomputers, academic research institutions, and other clusters. Since Dask-jobqueue provides interfaces for OAR and Slurm-based clusters, it can be used to facilitate the switch between OAR and Slurm based resource managers.

Drivers for virtualization and containers solutions

docker-machine-driver-g5k: using a Grid'5000 physical machine directly from Docker

docker-machine-driver-g5k is a DockerMachine driver that leverages the provisioning of a Grid'5000 node as a machine in the context of Docker.

docker-g5k: provisioning a Docker cluster within Grid'5000

docker-g5k is a command line interface (CLI) and a Go library to provision a cluster of docker hosts made of multiple Grid5000 nodes, possibly provisioned across several sites.

vagrant-grid5000: using Grid'5000 physical machines directly from Vagrant

vagrant-grid5000 is a Vagrant Provider Plugin that makes it possible to use Grid'5000 nodes as machines in the context of Vagrant. When vagrant up is called, a node is reserved and deployed (by default, on the nancy site, and with the jessie-x64-min environment), and then provisioned by Vagrant.

vagrant-g5k: manage virtual machines on Grid'5000 using vagrant

vagrant-g5k is a vagrant provider to manage virtual machines on Grid'5000. It supports different kinds of networking (Nat, bridge), disk backends (local, RBD, snapshots). Usual vagrant commands are supported (provision, rsync, ssh ...).

Deployment of complex software stacks inside Grid'5000

benchmark-containers: leveraging the deployment of standard benchmarks

benchmark-containers shares a repository of state-of-the-art benchmarks (Parsec, IOzone, etc.) packaged as Docker containers.

EnOS : Experimental environment for OpenStack

Enos deploys OpenStack and targets reproducible experiments. With Enos you can deploy, customize, benchmark OpenStack.

hadoop-benchmark: leveraging the deployment of Vanilla Hadoop

hadoop-benchmark provides Docker containers to build an Hadoop infrastructure and experiment feedback control loops atop of it.

Emulation tools

Distem: CPU performance and network emulator

Distem is a distributed systems emulator. It can be used to transform an homogeneous Grid'5000 cluster (composed of identical nodes) into an experimental platform where nodes have different performance, and are linked together through a complex network topology, making it the ideal tool to benchmark applications targeting such environments.

Monitoring software

PowerAPI : monitoring the power consumption of processes

PowerAPI is a middleware toolkit for building software-defined power meters that can monitor the power consumption of software processes and containers across several Grid'5000 nodes.


Infrastructure software

This section lists software that is used to run the Grid'5000 infrastructure.

Kadeploy: bare-metal cluster provisioning solution

Kadeploy is a scalable, efficient and reliable deployment system (cluster provisioning solution) for cluster computing. It provides a set of tools for cloning, configuring (post installation) and managing cluster nodes. It can deploy a 300-nodes cluster in a few minutes, without intervention from the system administrator. It can deploy Linux, *BSD, Windows, Solaris.

On Grid'5000, it allows users to re-install nodes and get administrator access on them for the duration of their resources reservation.

Kameleon: a reproducible appliance builder

Kameleon is a simple but powerful tool to generate customized system appliances.

Kameleon is used in the process of generating Grid'5000 system environments, which users deploy with Kadeploy, from scripted recipes.

  • Maintainers: DataMove team at LIG / Inria Grenoble Rhone-Alpes (Contact: Pierre Neyron, Michael Mercier)
  • Website: http://kameleon.imag.fr

KaVlan: network reconfiguration

KaVLAN provides network isolation capabilities for Grid'5000 users' experimentations, via a high-level, user-driven interface to VLANs (802.1Q).

Kwollect : collecting metrics of IT infrastructures

Kwollect is a framework for collecting metrics of IT infrastructures (energy, performance, environmental, ...) and make them available to users.

It targets high frequency collection with lossless & long term storage of metrics and focuses on environmental metrics.

OAR2: resource and job management system

OAR is a versatile resource and task manager (also called a batch scheduler) for HPC clusters, and other computing infrastructures (like distributed computing experimental testbeds where versatility is a key).

OAR is the low level resources reservation tool managing most of Grid'5000 resources, which users use directly or through the Grid'5000 API.


Help.png Grid'5000 users, please send an email to web-staff if you would like to see a new software appear on this page.