Booting and Using Virtual Machines on Grid'5000: Difference between revisions

From Grid5000
Jump to navigation Jump to search
mNo edit summary
No edit summary
Line 6: Line 6:
{{Portal|Tutorial}}
{{Portal|Tutorial}}


in construction
= Overview =
 
The goal of this tutorial is to introduce the use of large amount of virtual machines overs the Grid'5000 platform.
 
After a short part about Grid'5000 specifications and the requirements of this session, users will be presented a set of scripts and tools designed over the Grid'5000 software stack to deploy and interact with a significant number of virtual machines.
 
These instances can then be used at the user convenience in order to investigate particular concerns.
 
= Grid'5000 specifications =
 
When booting KVM instances over the platform, we need physical hosts that supports hardware virtualisation.
During the deployment, we are in charge of configuring the IP network used by virtual machines throught [[KaVLAN|network isolation]] capabilities of Grid'5000 and the [[Subnet reservation|subnet reservation system]].
It enables the use of an IP network ranging from ranging from /22 to /16 subnets, and ensures the communications with the various instances.
 
= Deployment =
 
 
The first step is to retrieve the last version of the scripts
{{Term|location=frontend|cmd=<code class="command">scp -r /home/alebre/dynVM__TEMPLATE__/ '''DESTINATION_FOLDER''' </code>}}
 
 
Move to the folder containing the code
{{Term|location=frontend|cmd=<code class="command">cd '''DESTINATION_FOLDER''' </code>}}
 
 
Book the g5k resources according to the desired time and duration of your experiment
We will now consider as ''master site'' the site from which you book the ressources and executes the scripts.
This script will return you an OAR that includes the reservation of the nodes, a virtual network and a subnet
{{Term|location=frontend|cmd=<code class="command"> ./getmaxgridnodes.sh  2012-24-07 19:00:00 03:00:00 </code>}}
 
 
Execute the oar request returned by getmaxgridnodes.sh (don't forget to redirect it as shown in the example)
The master site is sophia in the following example.
{{Term|location=frontend|cmd=<code class="command"> oargridsub -t deploy -w 3:00:00 -s "2012-24-07 19:00:00" sophia:rdef="{\\\\\\\"type='kavlan-global'\\\\\\\"}/vlan=1+/nodes=5+/slash_18=1",rennes:rdef=/nodes=5,nancy:rdef=/nodes=5,lille:rdef=/nodes=5 > ~/oargrid.out</code>}}
 
 
Move to the Flauncher directory
{{Term|location=frontend|cmd=<code class="command"> cd Flauncher</code>}}
 
 
Get the list of nodes and connect to the OAR grid job
{{Term|location=frontend|cmd=<code class="command"> ./grid_get_nodes.sh</code>}}
 
 
You are now connected to the grid OAR job
 
 
Display the list of nodes
{{Term|location=frontend|cmd=<code class="command"> less ~/gridnodes-uniq </code>}}
 
 
Move to the Flauncher directory
{{Term|location=frontend|cmd=<code class="command"> cd Flauncher</code>}}
 
 
Deploy the nodes
{{Term|location=frontend|cmd=<code class="command"> ./grid_deploy_nodes.sh</code>}}
 
 
Deploy the vlan and set the hypervisor (Warning : Use lowercase letter and repeat the master site as the first site)
{{Term|location=frontend|cmd=<code class="command"> ./vlan_auto_launch.sh sophia "sophia nancy lille rennes"</code>}}
 
 
We use a service node during the process.
 
Retrieve the service node
<pre>
service_node=$(sed -n '/sophia/p' ./log/machines-list.txt | head -n1)
</pre>
 
Connect to the service node
{{Term|location=frontend|cmd=<code class="command"> ssh root@$service_node </code>}}
 
 
 
= Usage =
 
= In Practice : 10240 Virtual Machines on 512 Physical Hosts =
 
Considering that physical machines must support hardware virtualization
to start KVM instances, the largest experiment that has been conducted up
to now involved 10240 KVM instances upon 512 nodes through 4 sites and 10
clusters. The whole setup is performed in less than 30 minutes with about 10
minutes spent on the deployment of the nodes, 5 minutes for the installation
and configuration of the required packages on the physical hosts, while the rest
is dedicated to the booting of the virtual machines. The result of that work
opens doors to the manipulation of virtual machines thoughout a distributed
infrastructure like traditionnal operating systems handle process on a local node.
 
= Useful Tips =
 
== Booking grid resources ==
 
* Provide the request that retrieves the maximum number of nodes that is available during the time slot defined on specific clusters
 
disco -s '2012-07-24 19:00:00' -w 1 griffon graphene
 
== Deployments ==
 
* Ensure a minimum amount of deployed nodes
 
To ensure that 95% (rounded down) of the reserved nodes are correctly deployed (3 attempts max), instead of running:
 
kadeploy3 -e squeeze-x64-prod -f $OAR_NODEFILE -k
 
Run:
 
NB_NODES=$(sort -u $OAR_NODE_FILE | wc -l)
MIN_NODES=$(($NB_NODES * 95/100))
/grid5000/code/bin/katapult3 --deploy-env squeeze-x64-prod --copy-ssh-key --min-deployed-nodes $MIN_NODES --max-deploy-runs 3
 
== Communication ==
 
* About the Saturation of ARP tables
[[ARP_table_overflow_tuning]]
 
[[TCP_bandwidth_tuning]]
 
== Tools ==
 
* Network isolation over Grid'5000 : [[KaVLAN|KaVLAN]]
* Booking a range of IP adresses : [[Subnet reservation|Subnet Reservation System]]
* Deployment of nodes : [[Kadeploy|Kadeploy3]]
 
= Contact =
 
* Adrien Lèbre,  Associate Professor at Mines Nantes, Ecole des mines de Nantes, Ascola Team | [mailto:adrien.lebre@inria.fr mail]
* Daniel Balouek, Ingénieur Héméra, ENS Lyon, Inria, Avalon Team | [mailto:daniel.balouek@inria.fr mail]
* Flavien Quesnel, PhD student, Ecole des mines de Nantes, Ascola Team | [mailto:flavien.quesnel@inria.fr mail]

Revision as of 15:18, 22 August 2012


Overview

The goal of this tutorial is to introduce the use of large amount of virtual machines overs the Grid'5000 platform.

After a short part about Grid'5000 specifications and the requirements of this session, users will be presented a set of scripts and tools designed over the Grid'5000 software stack to deploy and interact with a significant number of virtual machines.

These instances can then be used at the user convenience in order to investigate particular concerns.

Grid'5000 specifications

When booting KVM instances over the platform, we need physical hosts that supports hardware virtualisation. During the deployment, we are in charge of configuring the IP network used by virtual machines throught network isolation capabilities of Grid'5000 and the subnet reservation system. It enables the use of an IP network ranging from ranging from /22 to /16 subnets, and ensures the communications with the various instances.

Deployment

The first step is to retrieve the last version of the scripts

Terminal.png frontend:
scp -r /home/alebre/dynVM__TEMPLATE__/ DESTINATION_FOLDER


Move to the folder containing the code

Terminal.png frontend:
cd DESTINATION_FOLDER


Book the g5k resources according to the desired time and duration of your experiment We will now consider as master site the site from which you book the ressources and executes the scripts. This script will return you an OAR that includes the reservation of the nodes, a virtual network and a subnet

Terminal.png frontend:
./getmaxgridnodes.sh 2012-24-07 19:00:00 03:00:00


Execute the oar request returned by getmaxgridnodes.sh (don't forget to redirect it as shown in the example) The master site is sophia in the following example.

Terminal.png frontend:
oargridsub -t deploy -w 3:00:00 -s "2012-24-07 19:00:00" sophia:rdef="{\\\\\\\"type='kavlan-global'\\\\\\\"}/vlan=1+/nodes=5+/slash_18=1",rennes:rdef=/nodes=5,nancy:rdef=/nodes=5,lille:rdef=/nodes=5 > ~/oargrid.out


Move to the Flauncher directory

Terminal.png frontend:
cd Flauncher


Get the list of nodes and connect to the OAR grid job

Terminal.png frontend:
./grid_get_nodes.sh


You are now connected to the grid OAR job


Display the list of nodes

Terminal.png frontend:
less ~/gridnodes-uniq


Move to the Flauncher directory

Terminal.png frontend:
cd Flauncher


Deploy the nodes

Terminal.png frontend:
./grid_deploy_nodes.sh


Deploy the vlan and set the hypervisor (Warning : Use lowercase letter and repeat the master site as the first site)

Terminal.png frontend:
./vlan_auto_launch.sh sophia "sophia nancy lille rennes"


We use a service node during the process.

Retrieve the service node

service_node=$(sed -n '/sophia/p' ./log/machines-list.txt | head -n1)

Connect to the service node

Terminal.png frontend:
ssh root@$service_node


Usage

In Practice : 10240 Virtual Machines on 512 Physical Hosts

Considering that physical machines must support hardware virtualization to start KVM instances, the largest experiment that has been conducted up to now involved 10240 KVM instances upon 512 nodes through 4 sites and 10 clusters. The whole setup is performed in less than 30 minutes with about 10 minutes spent on the deployment of the nodes, 5 minutes for the installation and configuration of the required packages on the physical hosts, while the rest is dedicated to the booting of the virtual machines. The result of that work opens doors to the manipulation of virtual machines thoughout a distributed infrastructure like traditionnal operating systems handle process on a local node.

Useful Tips

Booking grid resources

  • Provide the request that retrieves the maximum number of nodes that is available during the time slot defined on specific clusters
disco -s '2012-07-24 19:00:00' -w 1 griffon graphene

Deployments

  • Ensure a minimum amount of deployed nodes

To ensure that 95% (rounded down) of the reserved nodes are correctly deployed (3 attempts max), instead of running:

kadeploy3 -e squeeze-x64-prod -f $OAR_NODEFILE -k

Run:

NB_NODES=$(sort -u $OAR_NODE_FILE | wc -l)
MIN_NODES=$(($NB_NODES * 95/100))
/grid5000/code/bin/katapult3 --deploy-env squeeze-x64-prod --copy-ssh-key --min-deployed-nodes $MIN_NODES --max-deploy-runs 3

Communication

  • About the Saturation of ARP tables

ARP_table_overflow_tuning

TCP_bandwidth_tuning

Tools

Contact

  • Adrien Lèbre, Associate Professor at Mines Nantes, Ecole des mines de Nantes, Ascola Team | mail
  • Daniel Balouek, Ingénieur Héméra, ENS Lyon, Inria, Avalon Team | mail
  • Flavien Quesnel, PhD student, Ecole des mines de Nantes, Ascola Team | mail