Hemera:WG:Simulators

From Grid5000
Jump to navigation Jump to search

Modeling Large Scale Systems and Validating their Simulators (Simulators)

Leaders: Martin Quinson (ALGORILLE), Arnaud Legrand (MESCAL)

Real testbeds like Grid’5000 are not the only way to experiment with large-scale distributed applications. Researchers often rely on tools such as simulators (based on models of the platform) or emulators (using real resources, but with artificially degraded performance to offer experimental control). These tools can reveal precious when the user do not have access to a real testbed, or want to quickly experiment with an idea before developing full-fledged solutions.

But the complexity of modern systems and the rapid evolution of technology and sys- tems, measurements and analysis often quickly become obsolete (not to speak of models built from these analysis). As a consequence, computer measurements do not lead to a universal and enduring truth. In particular they should often check the validity of their models and precisely detail the circumstances under which the data they use is collected. This good practice is unfortunately not very common. Most people often use tools (either from their own or from others) without ever questioning the relevance of the used models.

This working group focuses on the challenge to come up with validated models and tools (simulators and emulators) for large scale systems, constituting scientific instruments for the specialists of the field.

Three kinds of benefits are expected. First, it would enable to study the accuracy of simulators with real experiments made on Grid’5000. Second, it can help in understanding how to find bad configurations of Grid’5000. Third, simulators enable to gather more easily statistics about an application because the application behavior is not impacted by monitoring overhead as in real experiments.

To reach this goal, several points are to be considered:

  • Model the main resources of Grid systems, and how contention and ressource sharing between competing users do impact the overall availability. The parts to be modeled include CPU, disks and memory of computers as well as SAN, LAN or WAN networks. These models could be either analytical, discret-event systems or based on other approaches.
  • Characterize typical instantiation of the models for typical settings. This should group both static information about the topology and peek performance of the system, and dynamic information about the availability and failures during typical period of time.
  • Provide generic frameworks and experimental testbeds to compare these models with each other and to the behaviors observed in real settings. In this context, the experimental platform offered by Grid’5000 will enable us to rigorously confront models and simulators with reality.