Grid5000:Software: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
(EnOSlib)
 
(One intermediate revision by one other user not shown)
Line 76: Line 76:
More pragmatically, with the EnOSlib, you can iterate on your application deployment and experimental workflow locally before moving to a large testbed like Grid‘5000, or Chameleon. It saves time and energy.
More pragmatically, with the EnOSlib, you can iterate on your application deployment and experimental workflow locally before moving to a large testbed like Grid‘5000, or Chameleon. It saves time and energy.


* '''Maintainer:''' Matthieu Simonin
* '''Maintainers:''' Matthieu Simonin, Baptiste Jonglez
* '''Website:''' https://discovery.gitlabpages.inria.fr/enoslib/
* '''Website:''' https://discovery.gitlabpages.inria.fr/enoslib/


Line 96: Line 96:
== Dask-jobqueue ==
== Dask-jobqueue ==


[https://jobqueue.dask.org/en/latest/index.html Dask-jobqueue] is a Python library which makes it easy to deploy [https://www.dask.org/ Dask] on common job queuing systems typically found in high performance supercomputers, academic research institutions, and other clusters. Since Dask-jobqueue provides interfaces for OAR and Slurm based clusters, it can be used to facilitate the passage between OAR and Slurm based resource managers.
[https://jobqueue.dask.org/en/latest/index.html Dask-jobqueue] is a Python library which makes it easy to deploy [https://www.dask.org/ Dask] on common job queuing systems typically found in high performance supercomputers, academic research institutions, and other clusters. Since Dask-jobqueue provides interfaces for OAR and Slurm-based clusters, it can be used to facilitate the switch between OAR and Slurm based resource managers.


* '''Maintainer:''' Dask-jobqueue Development Team
* '''Maintainer:''' Dask-jobqueue Development Team

Latest revision as of 17:49, 12 June 2024


This page lists software developed by the Grid'5000 community or the Grid'5000 team.

Experiment management tools

Execo: unix processes orchestration, and experiment orchestration

Execo offers a Python API for asynchronous control of local or remote, standalone or parallel, unix processes. It is especially well suited for quickly and easily scripting workflows of parallel/distributed operations on local or remote hosts: automate a scientific workflow, conduct computer science experiments, perform automated tests, etc. The core python package is execo. The execo_g5k package provides a set of tools and extensions for the Grid5000 testbed. The execo_engine package provides tools to ease the development of computer sciences experiments.

The README and User Guide are the best entry points to the execo documentation. The User Guide shows with simple and step by step examples how to benefit from execo for experiment scripting on Grid'5000.

Funk: (F)ind yo(U)r (N)odes on g5(K)

Funk is a tool that help you to plan your experiments on Grid'5000. It is a Python script based on Execo that can give you nodes available at a given time, find available free slots for a combination of resources, etc.

PAR: a PARallel and distributed job crusher

PAR is a scalable, dynamic, parallel and distributed execution engine for Bag-of-Tasks. PAR is aimed at multi-core architectures and small clusters. PAR is production ready and should be faster than GNU parallel.

Ruby-Cute: Ruby gem for Grid'5000

Ruby-Cute is a set of Commonly Used Tools for Experiments, or Critically Useful Tools for Experiments. It is a Ruby library (gem) aggregating various snippets useful in the context of development of experiment control scripts on Grid'5000. It includes code for interaction with the Grid'5000 REST API, parallel SSH execution, etc.

Of particular interest is the xp-bootstrap example that shows how to automate the various steps usually performed at the beginning of an experiment (resources reservation, deployment, configuration).

grd: modern command-line utility for typical Grid'5000 workflows

grd is a command line utility, included in Ruby-Cute, that provides a modern interface to typical Grid'5000 workflows. As an example, grd bootstrap -s ly -l nodes=3 -w 0:10 -e debian11-x64-min -f setup-script -c will reserve 3 nodes on the lyon site, provision the debian11-x64-min environment, copy and run setup-script on the first node, and then connect interactively using SSH.

For more information about how to use grd, run grd --help and grd bootstrap --help on any frontend. grd can also work from your own computer (outside Grid'5000) after installing Ruby-Cute.

TakTuk: parallel launcher

TakTuk is a tool for deploying parallel remote executions of commands to a potentially large set of remote nodes.

TakTuk is used both internally for the infrastructure management and by end users, to manage large parallel remote executions on Grid'5000.

Python-grid5000

Python-grid5000 is a thin wrapper around the Grid'5000 REST API. It's a python library (pip). It exposes the main Grid'5000 resources (jobs, deployments, storage, vlans ...) and allow their modifications in a friendly manner.

EnOSlib

EnOSlib helps you deploy applications on various platforms. It targets role based deployments and allows you to describe your operations using tasks. More pragmatically, with the EnOSlib, you can iterate on your application deployment and experimental workflow locally before moving to a large testbed like Grid‘5000, or Chameleon. It saves time and energy.

Restfully

Restfully is a general-purpose client library for RESTful APIs, written in Ruby. Its goal is to abstract the nitty-gritty details of exchanging HTTP requests between the user-agent and the server. In the context of Grid5000, it is particularly suited to exploring the reference API, and also works with the jobs and deployments APIs.

Grid5000 Terraform provider

Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files. The Grid5000 provider allow to interact with OAR jobs, Kadeploy bare-metal deployments and Ceph pools. Combined with RKE provider you will be able to deploy Kubernetes clusters and complex software stacks, see tutorials.

Dask-jobqueue

Dask-jobqueue is a Python library which makes it easy to deploy Dask on common job queuing systems typically found in high performance supercomputers, academic research institutions, and other clusters. Since Dask-jobqueue provides interfaces for OAR and Slurm-based clusters, it can be used to facilitate the switch between OAR and Slurm based resource managers.

Drivers for virtualization and containers solutions

docker-machine-driver-g5k: using a Grid'5000 physical machine directly from Docker

docker-machine-driver-g5k is a DockerMachine driver that leverages the provisioning of a Grid'5000 node as a machine in the context of Docker.

docker-g5k: provisioning a Docker cluster within Grid'5000

docker-g5k is a command line interface (CLI) and a Go library to provision a cluster of docker hosts made of multiple Grid5000 nodes, possibly provisioned across several sites.

vagrant-grid5000: using Grid'5000 physical machines directly from Vagrant

vagrant-grid5000 is a Vagrant Provider Plugin that makes it possible to use Grid'5000 nodes as machines in the context of Vagrant. When vagrant up is called, a node is reserved and deployed (by default, on the nancy site, and with the jessie-x64-min environment), and then provisioned by Vagrant.

vagrant-g5k: manage virtual machines on Grid'5000 using vagrant

vagrant-g5k is a vagrant provider to manage virtual machines on Grid'5000. It supports different kinds of networking (Nat, bridge), disk backends (local, RBD, snapshots). Usual vagrant commands are supported (provision, rsync, ssh ...).

Deployment of complex software stacks inside Grid'5000

benchmark-containers: leveraging the deployment of standard benchmarks

benchmark-containers shares a repository of state-of-the-art benchmarks (Parsec, IOzone, etc.) packaged as Docker containers.

EnOS : Experimental environment for OpenStack

Enos deploys OpenStack and targets reproducible experiments. With Enos you can deploy, customize, benchmark OpenStack.

hadoop-benchmark: leveraging the deployment of Vanilla Hadoop

hadoop-benchmark provides Docker containers to build an Hadoop infrastructure and experiment feedback control loops atop of it.

Emulation tools

Distem: CPU performance and network emulator

Distem is a distributed systems emulator. It can be used to transform an homogeneous Grid'5000 cluster (composed of identical nodes) into an experimental platform where nodes have different performance, and are linked together through a complex network topology, making it the ideal tool to benchmark applications targeting such environments.

Monitoring software

PowerAPI : monitoring the power consumption of processes

PowerAPI is a middleware toolkit for building software-defined power meters that can monitor the power consumption of software processes and containers across several Grid'5000 nodes.


Infrastructure software

This section lists software that is used to run the Grid'5000 infrastructure.

Kadeploy: bare-metal cluster provisioning solution

Kadeploy is a scalable, efficient and reliable deployment system (cluster provisioning solution) for cluster computing. It provides a set of tools for cloning, configuring (post installation) and managing cluster nodes. It can deploy a 300-nodes cluster in a few minutes, without intervention from the system administrator. It can deploy Linux, *BSD, Windows, Solaris.

On Grid'5000, it allows users to re-install nodes and get administrator access on them for the duration of their resources reservation.

Kameleon: a reproducible appliance builder

Kameleon is a simple but powerful tool to generate customized system appliances.

Kameleon is used in the process of generating Grid'5000 system environments, which users deploy with Kadeploy, from scripted recipes.

  • Maintainers: DataMove team at LIG / Inria Grenoble Rhone-Alpes (Contact: Pierre Neyron, Michael Mercier)
  • Website: http://kameleon.imag.fr

KaVlan: network reconfiguration

KaVLAN provides network isolation capabilities for Grid'5000 users' experimentations, via a high-level, user-driven interface to VLANs (802.1Q).

Kwollect : collecting metrics of IT infrastructures

Kwollect is a framework for collecting metrics of IT infrastructures (energy, performance, environmental, ...) and make them available to users.

It targets high frequency collection with lossless & long term storage of metrics and focuses on environmental metrics.

OAR2: resource and job management system

OAR is a versatile resource and task manager (also called a batch scheduler) for HPC clusters, and other computing infrastructures (like distributed computing experimental testbeds where versatility is a key).

OAR is the low level resources reservation tool managing most of Grid'5000 resources, which users use directly or through the Grid'5000 API.


Help.png Grid'5000 users, please send an email to web-staff if you would like to see a new software appear on this page.