Experiment scripting tutorial

From Grid5000
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

This page presents several tools that aim at facilitating experiment scripting on Grid'5000.

  • Execo offers a Python API for asynchronous control of local or remote, standalone or parallel, unix processes. It is especially well suited for quickly and easily scripting workflows of parallel/distributed operations on local or remote hosts: automate a scientific workflow, conduct computer science experiments, perform automated tests, etc. The core python package is execo. The execo_g5k package provides a set of tools and extensions for the Grid5000 testbed. The execo_engine package provides tools to ease the development of computer sciences experiments.
  • Ruby-Cute is a set of Commonly Used Tools for Experiments, or Critically Useful Tools for Experiments, depending on who you ask. It is a library aggregating various Ruby snippets useful in the context of (but not limited to) development of experiment software on distributed systems testbeds such as Grid'5000. Ruby-Cute is structured in different modules. G5K module allows you to communicate with Grid'5000. Net::SSH::Multi module allows the parallel execution of commands in several remote machines using the SSH protocol. TakTuk module is a wrapper of taktuk parallel command executor.
  • Funk helps you to find resources for your experiments, by:
    • giving you the number of nodes available at a date and for walltime
    • finding the slots for a combination of resources and a walltime
    • finding the slot with the maximum number of nodes for a period and a walltime
    • managing the reservation of the resources
  • EnOSlib is a python library that mutualizes common experiment practices especially when dealing with distributed applications deployments. EnOSlib uses different providers to get resources from an infrastructure. For instance, on Grid'5000 one can easily get a physical environment (non-deploy job/deploy job with or without multiple NICs configured) or a virtualized environment (e.g based on kvm virtual machines). Resources are configured using safe parallel actions (based on Ansible Modules) or using on-the-shelf packaged applications (e.g a monitoring stack, a distributed network packets sniffer).
  • Dask-jobqueue is a Python library which makes it easy to deploy Dask on common job queuing systems typically found in high performance supercomputers, academic research institutions, and other clusters. It can be used to facilitate the switch between different resource managers, based on OAR and Slurm for example.
  • Nipype is an open-source software developed in Python by the neuroimaging community. It provides a generic interface to analyze data using a variety of different algorithms or existing neuroimaging software. It can be used to create complex workflows that can run in parallel on many cores/machines.
  • Submitit is a lightweight tool designed for submitting Python functions for computation within a Slurm cluster. It acts as a wrapper for submission and provides access to results, logs and more. It allows for seamless execution switching between Slurm or local environments. To facilitate the switch between OAR and Slurm-based resource managers for users of Inria's national computing infrastructure, support for the OAR cluster has been added in Submitit, through a plugin called submitit_oar.


Those tools are independent and can be studied in any order.

Execo

A tutorial can be found here, and an example of usage is also presented in the BigData hands-on tutorial.

Ruby-Cute

A tutorial can be found in here

Funk

Funk stands for (F)ind yo(U)r (N)odes on g5(K). More details can be found here.

EnOSlib

Tutorials and API documentation can be found in here

In particular, the tutorial page has many examples to work with Grid'5000.

Dask-jobqueue

A tutorial can be found in here.

Nipype

A tutorial can be found in here.

Submitit

A tutorial can be found in here.