HPC and HTC tutorial: Difference between revisions

From Grid5000
Jump to navigation Jump to search
Line 30: Line 30:
The first intent of Grid'5000 is to be a testbed for '''experiment-driven research in all areas of computer science''' with a focus on parallel and distributed computing including Cloud, HPC and Big Data. However, Grid5000 offers such a '''large amount of resources''' that it allows the use of its idle resources for more '''production oriented projects'''. These include [https://en.wikipedia.org/wiki/High-throughput_computing HTC (High-throughput computing)] projects requiring the execution of a '''large number of loosely-coupled tasks'''. This usage of Grid'5000 is only allowed for projects in connection with computer science research and the jobs must be submitted using the '''besteffort''' mode of the scheduler.  
The first intent of Grid'5000 is to be a testbed for '''experiment-driven research in all areas of computer science''' with a focus on parallel and distributed computing including Cloud, HPC and Big Data. However, Grid5000 offers such a '''large amount of resources''' that it allows the use of its idle resources for more '''production oriented projects'''. These include [https://en.wikipedia.org/wiki/High-throughput_computing HTC (High-throughput computing)] projects requiring the execution of a '''large number of loosely-coupled tasks'''. This usage of Grid'5000 is only allowed for projects in connection with computer science research and the jobs must be submitted using the '''besteffort''' mode of the scheduler.  


Besteffort jobs are executed on idle resources and are '''killed''' when a regular job request the resources. Therefore, besteffort jobs need to be monitored. In particular, long-running jobs should implement a checkpointing mecanism. If your job is of the type '''besteffort and idempotent''' (oarsub -t besteffort -t idempotent) and killed by the OAR scheduler, then another job is automatically created and put in the queue with same configuration (note that your job is also resubmitted if the exit code of your program is 99).
Besteffort jobs are executed on idle resources and are '''killed''' when a regular job requests the resources. Therefore, besteffort jobs need to be monitored. In particular, long-running jobs should implement a checkpointing mecanism. If your job is of the type '''besteffort and idempotent''' (oarsub -t besteffort -t idempotent) and killed by the OAR scheduler, then another job is automatically created and put in the queue with same configuration (note that your job is also resubmitted if the exit code of your program is 99).


Applications that submit a large amount of tasks (ie. '''Bag-of-Tasks''' campaigns) should consider using '''CiGri''', a grid middleware that dispatches the workload on the whole Grid'5000 infrastructure and equally distributes the idle resources among its users without overloading the infrastructure.
Applications that submit a large amount of tasks (ie. '''Bag-of-Tasks''' campaigns) should consider using '''CiGri''', a grid middleware that dispatches the workload on the whole Grid'5000 infrastructure and equally distributes the idle resources among its users without overloading the infrastructure.

Revision as of 16:31, 21 January 2016

Grid'5000 gives an easy access to a wide variety of hardware technologies and is particulary suitable to carry out HPC (high performance computing) experiments: Users can investigate parallel algorithms, scalability problems or performance portability on Grid'5000. Whereas HPC production systems generally have rather rigid restrictions (no root access, no possibility to install system-wide software, no ssh connection to the compute nodes, no internet access...), Grid'5000 does not suffer from these common limitations of HPC systems. In particular, Grid'5000 has a job scheduling policy that allow reservations in advance of ressources which is useful for setting up an experiment on your own schedule. You can also reinstall cluster nodes and gain root access during the time of your jobs using Kadeploy. This can be used to control the entire software stack, experiments with runtime environments, fine-tune network parameters (ex. MTU) or to simply ensure the reproducibility of your experiments by freezing its context. In addition, Grid'5000 provides a set of tools for monitoring experiments that you might find especially useful for detecting problems such as network contentions on distributed algorithms.

Discovering HPC resources on Grid'5000

The easiest way to get the global picture of the HPC systems available on Grid'5000 is to consult the Hardware page. This page is built using the Grid'5000 Reference API and describes in details the CPU models, network interfaces and accelerators of each clusters. You can also use the API Quick Start page as it provides advanced filters for selecting nodes by hardware capability. Alternatively, you can parse the Grid'5000 Reference API yourself to discover the available resources on each site.

Resource reservation on Grid'5000

Resource reservation using the OAR scheduler is covered by the Getting Started tutorial. You can select specific hardware by using the "-p" (properties) option of the oarsub command. The list of properties available on each site are listed on the Monika pages linked from the Status page. For instance, see the Monika page for Nancy. You can combined OAR properties or even use SQL queries for advance filtering.

Here is a non exhaustive list of OAR properties for HPC experiments:

  • CPU: cpuarch, cpucore, cpufreq, cputype
  • Memory (RAM in Mo): memnode (memory per node), memcpu (per cpu), memcore (per core)
  • Network: ethnb (number of network interfaces), eth10g, myri10g (myrinet), ib20g (infiniband), ib40g
  • Accelerator: gpu (=YES/SHARED/NO), gpu_count (number of GPU per node), mic (YES/NO)

For example, you can make a reservation at Lyon for a GPU node using:

Terminal.png flyon:
oarsub -I -p "gpu='YES'"

Or get a node with at least 256 Go of RAM at nancy:

Terminal.png fnancy:
oarsub -I -p "memnode>256000"

Using Grid'5000 resources as a HPC production system

The first intent of Grid'5000 is to be a testbed for experiment-driven research in all areas of computer science with a focus on parallel and distributed computing including Cloud, HPC and Big Data. However, Grid5000 offers such a large amount of resources that it allows the use of its idle resources for more production oriented projects. These include HTC (High-throughput computing) projects requiring the execution of a large number of loosely-coupled tasks. This usage of Grid'5000 is only allowed for projects in connection with computer science research and the jobs must be submitted using the besteffort mode of the scheduler.

Besteffort jobs are executed on idle resources and are killed when a regular job requests the resources. Therefore, besteffort jobs need to be monitored. In particular, long-running jobs should implement a checkpointing mecanism. If your job is of the type besteffort and idempotent (oarsub -t besteffort -t idempotent) and killed by the OAR scheduler, then another job is automatically created and put in the queue with same configuration (note that your job is also resubmitted if the exit code of your program is 99).

Applications that submit a large amount of tasks (ie. Bag-of-Tasks campaigns) should consider using CiGri, a grid middleware that dispatches the workload on the whole Grid'5000 infrastructure and equally distributes the idle resources among its users without overloading the infrastructure.

Using HPC hardware on Grid'5000

The rest of this tutorial is divided into distinct parts that can be done in any order: