Experiment scripting tutorial: Difference between revisions
No edit summary |
(→EnOSlib: add link to full tutorials) |
||
(17 intermediate revisions by 7 users not shown) | |||
Line 3: | Line 3: | ||
This page presents several tools that aim at facilitating experiment scripting on Grid'5000. | This page presents several tools that aim at facilitating experiment scripting on Grid'5000. | ||
* '''[ | * '''[https://mimbert.gitlabpages.inria.fr/execo/ Execo]''' offers a Python API for asynchronous control of local or remote, standalone or parallel, unix processes. It is especially well suited for quickly and easily scripting workflows of parallel/distributed operations on local or remote hosts: automate a scientific workflow, conduct computer science experiments, perform automated tests, etc. The core python package is execo. The execo_g5k package provides a set of tools and extensions for the Grid5000 testbed. The execo_engine package provides tools to ease the development of computer sciences experiments. | ||
* '''[https://github.com/ruby-cute/ruby-cute Ruby-Cute]''' is a set of Commonly Used Tools for Experiments, or Critically Useful Tools for Experiments, depending on who you ask. It is a library aggregating various Ruby snippets useful in the context of (but not limited to) development of experiment software on distributed systems testbeds such as Grid'5000. Ruby-Cute is structured in different modules. G5K module allows you to communicate with Grid'5000. Net::SSH::Multi module allows the parallel execution of commands in several remote machines using the SSH protocol. TakTuk module is a wrapper of taktuk parallel command executor. | * '''[https://github.com/ruby-cute/ruby-cute Ruby-Cute]''' is a set of Commonly Used Tools for Experiments, or Critically Useful Tools for Experiments, depending on who you ask. It is a library aggregating various Ruby snippets useful in the context of (but not limited to) development of experiment software on distributed systems testbeds such as Grid'5000. Ruby-Cute is structured in different modules. G5K module allows you to communicate with Grid'5000. Net::SSH::Multi module allows the parallel execution of commands in several remote machines using the SSH protocol. TakTuk module is a wrapper of taktuk parallel command executor. | ||
* '''[[Funk]]''' helps you to find resources for your experiments, by: | * '''[[Funk]]''' helps you to find resources for your experiments, by: | ||
Line 10: | Line 10: | ||
** finding the slot with the maximum number of nodes for a period and a walltime | ** finding the slot with the maximum number of nodes for a period and a walltime | ||
** managing the reservation of the resources | ** managing the reservation of the resources | ||
* '''[https://discovery.gitlabpages.inria.fr/enoslib/ EnOSlib]''' is a python library that mutualizes common experiment practices especially when dealing with distributed applications deployments. EnOSlib uses different providers to get resources from an infrastructure. For instance, on Grid'5000 one can easily get a physical environment (non-deploy job/deploy job with or without multiple NICs configured) or a virtualized environment (e.g based on kvm virtual machines). Resources are configured using safe parallel actions (based on Ansible Modules) or using on-the-shelf packaged applications (e.g a monitoring stack, a distributed network packets sniffer). | |||
* '''[https://jobqueue.dask.org/en/latest/index.html Dask-jobqueue]''' is a Python library which makes it easy to deploy Dask on common job queuing systems typically found in high performance supercomputers, academic research institutions, and other clusters. It can be used to facilitate the switch between different resource managers, based on OAR and Slurm for example. | |||
* '''[https://nipype.readthedocs.io/ Nipype]''' is an open-source software developed in Python by the neuroimaging community. It provides a generic interface to analyze data using a variety of different algorithms or existing neuroimaging software. It can be used to create complex workflows that can run in parallel on many cores/machines. | |||
* '''[https://github.com/facebookincubator/submitit Submitit]''' is a lightweight tool designed for submitting Python functions for computation within a Slurm cluster. It acts as a wrapper for submission and provides access to results, logs and more. It allows for seamless execution switching between Slurm or local environments. To facilitate the switch between OAR and Slurm-based resource managers for users of Inria's national computing infrastructure, support for the OAR cluster has been added in Submitit, through a plugin called [https://pypi.org/project/submitit_oar/ submitit_oar]. | |||
Those tools are independent and can be studied in any order. | Those tools are independent and can be studied in any order. | ||
= Execo = | = Execo = | ||
A tutorial can be found [[Execo_Practical_Session|here]]. | A tutorial can be found [[Execo_Practical_Session|here]], and an example of usage is also presented in the [[BigData hands-on tutorial]]. | ||
= Ruby-Cute = | = Ruby-Cute = | ||
A tutorial can be found in [ | A tutorial can be found in [https://github.com/ruby-cute/ruby-cute/blob/master/examples/g5k-tutorial.md here] | ||
= Funk = | = Funk = | ||
Funk stands for (F)ind yo(U)r (N)odes on g5(K). More details can be found [[Funk|here]]. | Funk stands for (F)ind yo(U)r (N)odes on g5(K). More details can be found [[Funk|here]]. | ||
= EnOSlib = | |||
Tutorials and API documentation can be found in [https://discovery.gitlabpages.inria.fr/enoslib/ here] | |||
In particular, the [https://discovery.gitlabpages.inria.fr/enoslib/tutorials/grid5000.html tutorial page] has many small examples to work with Grid'5000. | |||
Full EnOSlib tutorials given recently: | |||
* [https://sed-rennes.gitlabpages.inria.fr/formations/enoslib-acm-rep-24/intro/ (2024) tutorial at ACM REP'24 in Rennes] ([https://gitlab.inria.fr/sed-rennes/formations/enoslib-acm-rep-24 source code]) | |||
* [https://rsdschool21.gitlabpages.inria.fr/website/starting.html (2021) tutorial at RSD Research Automn School in Strasbourg] | |||
= Dask-jobqueue = | |||
A tutorial can be found in [[Dask-jobqueue|here]]. | |||
= Nipype = | |||
A tutorial can be found in [[Nipype|here]]. | |||
= Submitit = | |||
A tutorial can be found in [[Submitit|here]]. |
Latest revision as of 16:01, 17 July 2024
Note | |
---|---|
This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team. |
This page presents several tools that aim at facilitating experiment scripting on Grid'5000.
- Execo offers a Python API for asynchronous control of local or remote, standalone or parallel, unix processes. It is especially well suited for quickly and easily scripting workflows of parallel/distributed operations on local or remote hosts: automate a scientific workflow, conduct computer science experiments, perform automated tests, etc. The core python package is execo. The execo_g5k package provides a set of tools and extensions for the Grid5000 testbed. The execo_engine package provides tools to ease the development of computer sciences experiments.
- Ruby-Cute is a set of Commonly Used Tools for Experiments, or Critically Useful Tools for Experiments, depending on who you ask. It is a library aggregating various Ruby snippets useful in the context of (but not limited to) development of experiment software on distributed systems testbeds such as Grid'5000. Ruby-Cute is structured in different modules. G5K module allows you to communicate with Grid'5000. Net::SSH::Multi module allows the parallel execution of commands in several remote machines using the SSH protocol. TakTuk module is a wrapper of taktuk parallel command executor.
- Funk helps you to find resources for your experiments, by:
- giving you the number of nodes available at a date and for walltime
- finding the slots for a combination of resources and a walltime
- finding the slot with the maximum number of nodes for a period and a walltime
- managing the reservation of the resources
- EnOSlib is a python library that mutualizes common experiment practices especially when dealing with distributed applications deployments. EnOSlib uses different providers to get resources from an infrastructure. For instance, on Grid'5000 one can easily get a physical environment (non-deploy job/deploy job with or without multiple NICs configured) or a virtualized environment (e.g based on kvm virtual machines). Resources are configured using safe parallel actions (based on Ansible Modules) or using on-the-shelf packaged applications (e.g a monitoring stack, a distributed network packets sniffer).
- Dask-jobqueue is a Python library which makes it easy to deploy Dask on common job queuing systems typically found in high performance supercomputers, academic research institutions, and other clusters. It can be used to facilitate the switch between different resource managers, based on OAR and Slurm for example.
- Nipype is an open-source software developed in Python by the neuroimaging community. It provides a generic interface to analyze data using a variety of different algorithms or existing neuroimaging software. It can be used to create complex workflows that can run in parallel on many cores/machines.
- Submitit is a lightweight tool designed for submitting Python functions for computation within a Slurm cluster. It acts as a wrapper for submission and provides access to results, logs and more. It allows for seamless execution switching between Slurm or local environments. To facilitate the switch between OAR and Slurm-based resource managers for users of Inria's national computing infrastructure, support for the OAR cluster has been added in Submitit, through a plugin called submitit_oar.
Those tools are independent and can be studied in any order.
Execo
A tutorial can be found here, and an example of usage is also presented in the BigData hands-on tutorial.
Ruby-Cute
A tutorial can be found in here
Funk
Funk stands for (F)ind yo(U)r (N)odes on g5(K). More details can be found here.
EnOSlib
Tutorials and API documentation can be found in here
In particular, the tutorial page has many small examples to work with Grid'5000.
Full EnOSlib tutorials given recently:
- (2024) tutorial at ACM REP'24 in Rennes (source code)
- (2021) tutorial at RSD Research Automn School in Strasbourg
Dask-jobqueue
A tutorial can be found in here.
Nipype
A tutorial can be found in here.
Submitit
A tutorial can be found in here.