Grid5000:Gotchas

	Note
	For a more up to-date list of Gotchas, see https://www.grid5000.fr/status/artifact/

This page documents various gotchas (counter-intuitive features of Grid'5000) that could affect users' experiments in surprising ways.

Network

Global and per sites network documentation can be found on Grid5000:Network page.

Topology of ethernet networks

Most (large) clusters have a hierarchical ethernet topology, because ethernet switchs with a large number of ports are too expensive. A good example of such a hierarchical topology is the Rennes:Network for the paravance and parasilo clusters, where nodes are connected to 3 different switches. When doing experiments using the ethernet network intensively, it is a good idea to request nodes on the same switch, using e.g oarsub -l switch=1/nodes=5, or to request nodes connected to specific switch using e.g oarsub -p "switch='cisco2'" -l nodes=5.

Performance of ethernet networks

The backplane bandwidth of ethernet switches doesn't usually allow full-speed communications between all the ports of the switch.

High-performance networks

The topology of Infiniband and Omni-Path networks is generally less surprising, two "fat-tree" topologies can be found on the testbed:

non-blocking (1:1) : the number of up-link ports (from leaf switches to top switches) is equal to the number of down-link ports (nodes to leaf switches). Like that, all the nodes can communicate with each others at full-speed.
blocking (2:1): the number of up-link ports (from leaf switches to top switches) is half the number of down-link ports (nodes to leaf switches). Like that, nodes from the same leaf switch can communicate to each other at full speed, but not with nodes from others leaf switches.

Compute nodes

All Grid'5000 clusters are supposed to contain homogeneous (identical) sets of nodes, but there are some exceptions.

Global and per sites cluster documentation can be found on Hardware page.

Hard disks

Due to their high failure rate, hard disks tend to get replaced frequently, and it is not always possible to keep the same model during the whole life of a cluster. If this is important to you, please check exact disk model using the reference API, as storage is described in detail for each node.

Software

The standard environment (the one users get when not deploying) on all compute nodes is identical for a given architecture (x86-64, arm64 or ppc64), with the exception of additional drivers and software to support GPUs and High Speed networks on sites where they are available.
The user frontend are identical on all sites.
The reference environments (*-$arch-{min,base,nfs,xen,big}) are identical on all sites, for a given architecture.

Regarding CPU architectures, some differences can be found in environments:

Feature	x86-64	arm64	ppc64	env
Infiniband				base
OmniPath				base
NFS				nfs
Ceph				nfs
Xen Dom0				xen
Cuda				big
BeegFS				big
OpemMPI				big

Grid5000:Gotchas

Contents

Network

Topology of ethernet networks

Performance of ethernet networks

High-performance networks

Compute nodes

Hard disks

Software

Navigation menu

Grid5000:Gotchas

Network

Topology of ethernet networks

Performance of ethernet networks

High-performance networks

Compute nodes

Hard disks

Software

Navigation menu

Search