Virtualization in Grid'5000: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
 
(7 intermediate revisions by 4 users not shown)
Line 14: Line 14:
The next part is optional, it explains in details the contextualization mechanism, which allows you to customize your virtual machines.
The next part is optional, it explains in details the contextualization mechanism, which allows you to customize your virtual machines.
In the ''Multi-site experiment'' section, we will deploy 2 VMs on 2 sites, and we will measure the network bandwidth between them with iperf.
In the ''Multi-site experiment'' section, we will deploy 2 VMs on 2 sites, and we will measure the network bandwidth between them with iperf.
Finally, an alternative to KVM on the standard environment is quickly introduced: the Xen reference environments.


= Prerequisite: Network subnets reservation with g5k-subnets =
= Prerequisite: Network subnets reservation with g5k-subnets =
Line 82: Line 80:
== Disk image, virtual machine ==
== Disk image, virtual machine ==


A disk image containing Debian 10 Buster is available at the following path:
A disk image containing Debian 11 bullseye is available at the following path:
<code>/grid5000/virt-images/debian10-x64-base.qcow2</code>
<code>/grid5000/virt-images/debian11-x64-base.qcow2</code>


You can copy it on the node : It will be our base image for our VMs :
You can copy it on the node : It will be our base image for our VMs :
{{Term|location=node|cmd=<code class="command">cp /grid5000/virt-images/debian10-x64-base.qcow2 /tmp/</code>}}
{{Term|location=node|cmd=<code class="command">cp /grid5000/virt-images/debian11-x64-base.qcow2 /tmp/</code>}}


If we want to create multiple VMs, we will have to copy the qcow2 as many times as the number of VM we want.<br/>
If we want to create multiple VMs, we will have to copy the qcow2 as many times as the number of VM we want.<br/>
To gain storage space, we can use <code>debian10-x64-base.qcow2</code> as a backing file :
To gain storage space, we can use <code>debian11-x64-base.qcow2</code> as a backing file :
{{Term|location=node|cmd=<code class="command">qemu-img create -f qcow2 -o backing_file=/tmp/debian10-x64-base.qcow2 /tmp/domain1.qcow2</code>}}
{{Term|location=node|cmd=<code class="command">qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/domain1.qcow2</code>}}
By doing this, domain1.qcow2 will only store the difference from debian10-x64-base.qcow2 (and not the whole image)<br/>
By doing this, domain1.qcow2 will only store the difference from debian11-x64-base.qcow2 (and not the whole image)<br/>
If you want to create a second virtual machine based on the same image, simply run the same command with <code>domain2.qcow</code> instead of <code>domain1.qcow2</code>
If you want to create a second virtual machine based on the same image, simply run the same command with <code>domain2.qcow</code> instead of <code>domain1.qcow2</code>


Line 133: Line 131:
   <memory>2048000</memory>
   <memory>2048000</memory>
   <vcpu>1</vcpu>
   <vcpu>1</vcpu>
  <cpu mode='host-model'/>
   <os>
   <os>
     <type arch="x86_64">hvm</type>
     <type arch="x86_64">hvm</type>
Line 205: Line 204:
; You can also use the qemu/kvm command to start the virtual machine:  
; You can also use the qemu/kvm command to start the virtual machine:  


{{Term|location=node|cmd=<code class="command">qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2048 -rtc base=localtime -no-acpi -drive file=/tmp/debian10-x64-base.qcow2,cache=unsafe,media=disk,if=virtio,id=drive0 -netdev bridge,id=br0 -device virtio-net-pci,netdev=br0,mac=</code><code class="replace">AA:BB:CC:DD:EE:FF</code><code class="command"> -nographic</code>}}
{{Term|location=node|cmd=<code class="command">qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2048 -rtc base=localtime -no-acpi -drive file=/tmp/debian11-x64-base.qcow2,cache=unsafe,media=disk,if=virtio,id=drive0 -netdev bridge,id=br0 -device virtio-net-pci,netdev=br0,mac=</code><code class="replace">AA:BB:CC:DD:EE:FF</code><code class="command"> -nographic</code>}}


Or with a VNC display:
Or with a VNC display:


{{Term|location=node|cmd=<code class="command">qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2048 -rtc base=localtime -no-acpi -drive file=/tmp/debian10-x64-base.qcow2,cache=unsafe,media=disk,if=virtio,id=drive0 -netdev bridge,id=br0 -device virtio-net-pci,netdev=br0,mac=</code><code class="replace">AA:BB:CC:DD:EE:FF</code><code class="command"> -device cirrus-vga,id=video0 -vnc :0</code>}}
{{Term|location=node|cmd=<code class="command">qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2048 -rtc base=localtime -no-acpi -drive file=/tmp/debian11-x64-base.qcow2,cache=unsafe,media=disk,if=virtio,id=drive0 -netdev bridge,id=br0 -device virtio-net-pci,netdev=br0,mac=</code><code class="replace">AA:BB:CC:DD:EE:FF</code><code class="command"> -device cirrus-vga,id=video0 -vnc :0</code>}}


These are an example commands. Feel free to adapt them to your use case. (The qemu process can be launched in a <code class="command">screen</code> session in order to be detachable. If you are not familiar with screen, read its [[Screen|documentation]])
These are an example commands. Feel free to adapt them to your use case. (The qemu process can be launched in a <code class="command">screen</code> session in order to be detachable. If you are not familiar with screen, read its [[Screen|documentation]])
Line 226: Line 225:
= Contextualize your VMs with cloud-init =
= Contextualize your VMs with cloud-init =
As we have seen, we must use the console of our VM to configure SSH and connect to it later.  
As we have seen, we must use the console of our VM to configure SSH and connect to it later.  
It's a bit annoying if we have many VMs, we would have to manyally configure SSH on each instances within the console.
It's a bit annoying if we have many VMs, we would have to manually configure SSH on each instances within the console.


This part describes how to contextualize your VM using cloud-init.<br/>
This part describes how to contextualize your VM using cloud-init.<br/>
Line 267: Line 266:
We will run a new VM with contextualization :
We will run a new VM with contextualization :
First we create a new disk image from our base image :
First we create a new disk image from our base image :
{{Term|location=node:/tmp|cmd=<code class="command">qemu-img create -f qcow2 -o backing_file=/tmp/debian10-x64-base.qcow2 /tmp/contextualized-domain.qcow2</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/contextualized-domain.qcow2</code>}}


We create a new <code class="file">contextualized-domain.xml</code> with this content :
We create a new <code class="file">contextualized-domain.xml</code> with this content :
Line 276: Line 275:
   <memory>2048000</memory>
   <memory>2048000</memory>
   <vcpu>1</vcpu>
   <vcpu>1</vcpu>
  <cpu mode='host-model'/>
   <os>
   <os>
     <type arch="x86_64">hvm</type>
     <type arch="x86_64">hvm</type>
Line 362: Line 362:
{{Term|location=node:/tmp|cmd=<code class="command">genisoimage  -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">genisoimage  -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data</code>}}


We will destroy our running VM, create a fresh disk from debian10-x64-base.qcow2 and restart it :
We will destroy our running VM, create a fresh disk from debian11-x64-base.qcow2 and restart it :


{{Term|location=node:/tmp|cmd=<code class="command">virsh destroy contextualized-domain</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">virsh destroy contextualized-domain</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">qemu-img create -f qcow2 -o backing_file=/tmp/debian10-x64-base.qcow2 /tmp/contextualized-domain.qcow2</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/contextualized-domain.qcow2</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">virsh create contextualized-domain.xml</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">virsh create contextualized-domain.xml</code>}}


Line 415: Line 415:
You can create a new VM to test this new configuration :
You can create a new VM to test this new configuration :
{{Term|location=node:/tmp|cmd=<code class="command">virsh destroy contextualized-domain</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">virsh destroy contextualized-domain</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">qemu-img create -f qcow2 -o backing_file=/tmp/debian10-x64-base.qcow2 /tmp/contextualized-domain.qcow2</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/contextualized-domain.qcow2</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">virsh create contextualized-domain.xml</code>}}
{{Term|location=node:/tmp|cmd=<code class="command">virsh create contextualized-domain.xml</code>}}


Line 458: Line 458:
=== Copy a standard virtual machine image ===
=== Copy a standard virtual machine image ===


Copy the default virtual machine image from <code class="file">/grid5000/virt-images/debian10-x64-base.qcow2</code> to <code class="file">/tmp</code> on '''both''' nodes :
Copy the default virtual machine image from <code class="file">/grid5000/virt-images/debian11-x64-base.qcow2</code> to <code class="file">/tmp</code> on '''both''' nodes :


{{Term|location=nodes(both)|cmd=<code class="command">cp /grid5000/virt-images/debian10-x64-base.qcow2 /tmp/</code>}}
{{Term|location=nodes(both)|cmd=<code class="command">cp /grid5000/virt-images/debian11-x64-base.qcow2 /tmp/</code>}}


=== Configure cloud-init ===
=== Configure cloud-init ===
Line 479: Line 479:
   <memory>2048000</memory>
   <memory>2048000</memory>
   <vcpu>1</vcpu>
   <vcpu>1</vcpu>
  <cpu mode='host-model'/>
   <os>
   <os>
     <type arch="x86_64">hvm</type>
     <type arch="x86_64">hvm</type>
Line 490: Line 491:
     <disk type='file' device='disk'>
     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <driver name='qemu' type='qcow2'/>
       <source file='/tmp/debian10-x64-base.qcow2'/>
       <source file='/tmp/debian11-x64-base.qcow2'/>
       <target dev='vda' bus='virtio'/>
       <target dev='vda' bus='virtio'/>
     </disk>
     </disk>
Line 576: Line 577:
== Support for SR-IOV on Grid'5000 ==
== Support for SR-IOV on Grid'5000 ==


As of June 2021, SR-IOV is enabled on the BIOS of most clusters. This can be checked with the "bios / sriov" field in the reference API, for instance for grisou-1 in Nancy: https://api.grid5000.fr/3.0/sites/nancy/clusters/grisou/nodes/grisou-1
As of June 2021, SR-IOV is enabled on the BIOS of most clusters.


However, SR-IOV must also be enabled at all layers of the stack: network interface firmware, Linux kernel, driver. Currently, actual support for SR-IOV is the following:
However, SR-IOV must also be enabled at all layers of the stack: network interface firmware, Linux kernel, driver. An overview of which clusters have effective SR-IOV support can be seen at: [[Hardware#Nodes_with_SR-IOV_support]]. This table also lists the maximum number of VFs that can be enabled on each interface.


{|class="wikitable" style="text-align: center;"
Support for Mellanox Ethernet cards is planned.  Support for Intel i40e cards is currently not possible ({{Bug|12695}}).
|-
!|NIC vendor
!|NIC speed
!|NIC driver
!|SR-IOV status
|-
|Intel
|10G
|<code>ixgbe</code>
|'''Fully functional'''
|-
|Intel
|10G
|<code>i40e</code>
|Work in progress
|-
|Mellanox
|25G
|
|Planned
|-
|Intel
|1G
|
|Unknown
|-
|Broadcom
|
|
|Unknown
|-
|}


[[Support|Contact us]] if you have requirements that are not covered by the current state.
[[Support|Contact us]] if you have requirements that are not covered by the current state.
Line 622: Line 591:
However, here is an overview of the required steps on Grid'5000:
However, here is an overview of the required steps on Grid'5000:


* You must [[Advanced_Kadeploy|deploy an environment]] to have full control on the environment, such as the kernel command line. Using <code>debian10-x64-big</code> is a good start so that qemu is already installed.
* You must [[Advanced_Kadeploy|deploy an environment]] to have full control on the environment, such as the kernel command line. Using <code>debian11-x64-big</code> is a good start so that qemu is already installed.
* To enable PCI passthrough, you will probably need to pass parameters to the kernel command line: see [[CPU_parameters#Setting_kernel_parameters_with_Kadeploy]]
* To enable PCI passthrough, you will probably need to pass parameters to the kernel command line: see [[CPU_parameters#Setting_kernel_parameters_with_Kadeploy]]
* Once booted, create virtual functions with <code>echo X > /sys/class/net/$IFACE/device/sriov_numvfs</code>
* Once booted, create virtual functions with <code>echo X > /sys/class/net/$IFACE/device/sriov_numvfs</code>
Line 630: Line 599:
* Configure libvirt or qemu with PCI passthrough so that the virtual machine takes ownership of the Virtual Function
* Configure libvirt or qemu with PCI passthrough so that the virtual machine takes ownership of the Virtual Function


= Another alternative: Xen reference environments =
= Alternative: running qemu in user space using the Grid5000 create_tap script =
 
Grid'5000 proposes Xen reference environments, as an alternative to KVM on the standard environment.
This last part is a quick guide to Xen, we will show how to deploy a Xen environment on nodes, create virtual machines and use g5k-subnets for the network configuration.
 
{{Note|text=In Xen terminology, a domain U or domU is a virtual machine.
The domain 0 or dom0 is the physical machine which hosts the domUs (in our case the dom0 is the Grid5000 node you deployed).}}
 
== Reserve resources and deploy xen environment ==
 
{{Term|location=frontend|cmd=<code class="command">oarsub -I -t deploy -l slash_22=1+nodes=1,walltime=2:00</code>}}
{{Term|location=frontend|cmd=<code class="command">kadeploy3 -e debian10-x64-xen -f $OAR_FILE_NODES -k</code>}}
 
== DomU network configuration ==
 
The image <code class="env">debian10-x64-xen</code> includes a pre-configured domU.
The configuration file of this VM is placed in <code class="file">/etc/xen/domU.cfg</code>.
Inside this file, you can specify the parameters of your virtual machine. They are defined by:
 
* '''kernel''' and '''initrd''' : linux kernel and initrd with xen domU support.
* '''vcpus''' : number of virtual CPUs given to the VM.
* '''memory''' : size (MB) of RAM given to the VM.
* '''root''' : where is located the root partition .
* '''disk''' : which files contain the partitions on your virtual host.
* '''name''' : the name of the hostname, as displayed by xl list and as given by the system itself.
* '''vif''' : the configuration of the domU's network interfaces
* '''on_poweroff''' '''on_restart''' '''on_crash''' : how should react xen hypervisor on these events
 
You can find the official documentation and other options here :  http://xenbits.xen.org/docs/4.9-testing/man/xl.cfg.5.html
 
The vif line configures the domU's network. It usually contains:
 
* a MAC address
* the bridge name, in our case br0 which is a bridge that includes the production network interface.
 
{{Note|text=In the <code class="env">debian10-x64-xen</code> environment, the mac addresses is updated randomly at each boot, you can prevent this behavior by disabling the service xen-g5k}}
 
== Use the default domU ==
 
Select 1 IP from your reserved subnet:
 
{{Term|location=frontend|cmd=<code class="command">g5k-subnets -im <nowiki>|</nowiki> head -1 </code>}}
<pre class="brush: bash">
10.172.4.1      00:16:3E:AC:04:01
</pre>
 
Edit the file <code class="file">/etc/xen/domU.cfg</code> and replace the mac address. Then start the domU.
 
{{Term|location=node|cmd=<code class="command">xl create '''/etc/xen/domU.cfg'''</code>}}
{{Term|location=node|cmd=<code class="command">xl list</code>}}
Name        ID  Mem VCPUs      State  Time(s)
Domain-0    0  976    8    r-----      30.7
domU        1  512    1    -b----      4.7
 
The example VM is already configured to accept the debian10-x64-xen key.
So you can SSH to it without password, and without cloud-init :
{{Term|location=node|cmd=<code class="command">ssh root@</code><code class="replace">ip_g5k-subnet</code>}}
 
== Create a new domU ==
 
Select another ip and mac address, and create a new domU with the command <code class="command">xen-create-image</code>
 
{{Term|location=frontend|cmd=<code class="command">g5k-subnets -im</code>}}
<pre class="brush: bash">
...
10.172.4.3      00:16:3E:AC:04:03
...
</pre>
 
{{Term|location=node|cmd=<code class="command">xen-create-image --dir=/tmp/ --size=10G --hostname=domU2 --role=udev --genpass=0 --password=grid5000 --mac=00:16:3E:AC:04:03 --dhcp --bridge=br0 --memory=512M</code>}}
 
At this point, a new domU configuration file (<code class='file'>/etc/xen/domU2.cfg</code> and a new disk image <code class='file'>/tmp/domains/domU2/disk.img</code> have been generated.
 
{{Term|location=node|cmd=<code class="command">xl create '''/etc/xen/domU2.cfg'''</code>}}
 
Due to the default xen configuration in the debian10-x64-xen environment, the hosts SSH key has been copied during the image generation : You can SSH as root without password in '''domU2''' :
{{Term|location=node|cmd=<code class="command">ssh root@</code><code class="replace">ip_g5k-subnet</code>}}
 
== Using Grid'5000 qcow2 images ==
It's possible to run VM with Grid'5000 environnement.
First, copy the image and the script to setup cloud-init on the node (the dom0) :
{{Term|location=frontend|cmd=<code class="command"> scp /grid5000/virt-images/debian10-x64-min.qcow2 /grid5000/virt-images/cloud-init-example.sh root@node:/tmp </code>}}
 
To be able to SSH to the VM with your public key, run the following commands :
{{Term|location=frontend|cmd=<code class="command"> cat ~/.ssh/id_rsa.pub </code>}}
Copy your SSH key
{{Term|location=node|cmd=<code class="command"> apt-get update && apt-get install genisoimage </code>}}
{{Term|location=node|cmd=<code class="command"> cd /tmp; export cloud_init_key="</code><code class="replace">paste your SSH key</code><code class="command">" && /tmp/cloud-init-example.sh </code>}}
{{Term|location=node:/tmp|cmd=<code class="command"> genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data </code>}}
 
Then we create a domU config file /etc/xen/g5k_image.cfg :
 
#
# Configuration file for the Xen instance domU, created
# by xen-tools 4.7 on Fri Jun  1 00:48:00 2018.
#
#
#  Kernel + memory size
#
kernel      = '/vmlinuz'
extra      = 'elevator=noop'
ramdisk    = '/initrd.img'
vcpus      = '1'
memory      = '512'
#  Disk device(s).
#
root        = '/dev/xvda1 ro'
disk        = [
                  'format=qcow2, vdev=xvda, access=rw, target=/tmp/debian10-x64-min.qcow2',
                  'format=raw, vdev=hdc, access=ro, devtype=cdrom, target=/tmp/cloud-init-data.iso'
              ]
#  Hostname
#
name        = 'g5k_image'
#  Networking
#
dhcp        = 'dhcp'
vif        = [ 'mac=<code class="replace">MAC g5k-subnet</code>,bridge=br0' ]
#  Behaviour
#
on_poweroff = 'destroy'
on_reboot  = 'restart'
on_crash    = 'restart'
 
 
The important parts in this configuration file are :
* The 2 drives : 1 for the image of the environment and 1 for the cloud-init.iso
* The mac address you need to change
 
Finally, run the VM :
{{Term|location=node|cmd=<code class="command"> xl create /etc/xen/g5k_image.cfg </code>}}
{{Term|location=frontend|cmd=<code class="command"> ssh root@</code><code class="replace">VM_IP</code>}}
 
== Common administrative commands ==
 
* List the running domUs with the following command:
 
{{Term|location=dom0|cmd=<code class="command">xl list</code>}}
 
* Connect to a domU using the xen console


{{Term|location=dom0|cmd=<code class="command">xl console '''<domU-name>'''</code>}}
Thanks to the <code>create_tap</code> script, it is possible as a normal user to create a bridged VM using qemu, without requiring the usage of <code>sudo-g5k</code>.


* Start a domU
Connect to a frontend and reserve a node and a dedicated network for the VM:


{{Term|location=dom0|cmd=<code class="command">xl create '''/etc/xen/<domU-name>.cfg'''</code>}}
ssh nancy.g5k
oarsub -I -l slash_22=1+{"virtual!='NO'"}/nodes=1,walltime=2:00:00
oarsub -C


* Shutdown properly a domU
Create a minimalistic image file:


{{Term|location=dom0|cmd=<code class="command">xl shutdown '''<domU-name>'''</code>}}
qemu-img create -f qcow2 test.img 1G


* Instantly terminate a domU
Call <code>create_tap</code> to create a <code>tap0</code> device:


{{Term|location=dom0|cmd=<code class="command">xl destroy '''<domU-name>'''</code>}}
sudo create_tap


* Print information about the dom0
'''NB:''' do not use <code>sudo-g5k</code> but <code>sudo</code> instead as <code>create_tap</code> is a sudoable command granted for all users.


{{Term|location=dom0|cmd=<code class="command">xl info</code>}}
Launch the virtual machine:


* Shows real time monitoring information:
qemu-system-x86_64 -nographic -m 1G -hda test.img -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,disable-modern=off,netdev=mynet0,mac=$(g5k-subnets -i -m |head -1|awk '{print $2}')


{{Term|location=dom0|cmd=<code class="command">xl top</code>}}
Press Ctrl-B quickly to get access the iPXE command line, then call the following commands to get and check the expected IP:


== Going further ==
dhcp net0
show net0/ip


Please, refer to the official [https://wiki.xenproject.org/wiki/Xen_Project_4.9_Man_Pages Xen documentation] and [https://wiki.debian.org/Xen Debian documentation].
To terminate the qemu session, simply use the <code>Ctrl-A X</code> keystroke.

Latest revision as of 09:17, 2 July 2024

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

Purpose

This page presents how to use KVM on the standard environment (with a "non-deploy" reservation). The aim is to permit the execution of virtual machines on the nodes, along with a subnet reservation, which will give you a range of routed IP for your experiment.

In the first part, you will learn the basics of g5k-subnets, which is a prerequisite for the rest of this tutorial. The Quick start explains how to run a VM on the standard environment in the minimal number of steps. The next part is optional, it explains in details the contextualization mechanism, which allows you to customize your virtual machines. In the Multi-site experiment section, we will deploy 2 VMs on 2 sites, and we will measure the network bandwidth between them with iperf.

Prerequisite: Network subnets reservation with g5k-subnets

Users deploying VMs on Grid'5000 need to attribute IP address to them. Each site of Grid'5000 is allocated a /14 block for this purpose, divided in 4 smaller blocks.

OAR can be used to reserve a range of IPs. OAR permits to share the IP resources among users, and avoid the potential IP conflicts at the same time.

More details in the Subnet reservation page.

Reservation

A subnet reservation is done similarly to a compute resource reservation, in OAR.

To reserve 4 /22 subnets and 2 nodes, just type:

Terminal.png frontend:
oarsub -l slash_22=4+{"virtual!='NO'"}/nodes=2 -I

You can of course have more complex request. To obtain 4 /22 on different /19 subnets, you can type:

Terminal.png frontend:
oarsub -l slash_19=4/slash_22=1+{"virtual!='NO'"}/nodes=2/core=1 -I

To request a node from a specific cluster, advanced OAR usage is needed:

Terminal.png frontend:
oarsub -I -l slash_22=1+{"virtual!='NO' AND cluster='nova'"}/nodes=1,walltime=2:00:00

Usage

The simplest way to get the list of your allocated subnets is to use the g5k-subnets script provided on the head node of the submission.

# g5k-subnets
10.8.0.0
10.8.8.0

Several other printing options are available (-p option to display the CIDR format, -b to display broadcast address, -n to see the netmask, and -a is equivalent to -bnp):

# g5k-subnets -a
10.8.0.0/21	10.11.255.255	255.255.252.0	10.11.255.254
10.8.8.0/21	10.11.255.255	255.255.252.0	10.11.255.254

You can also summarize the subnets into a larger one if they are contiguous:

# g5k-subnets -sp
10.8.0.0/20

You can display all the available IP in your reservation, and their associated unique mac addresses, with the following command.

# g5k-subnets -im
10.158.16.1     00:16:3E:9E:10:01
...
Note.png Note

For detailed information, see the Subnet reservation page. The Grid5000:Network page also describes our organization of the virtual IP space inside Grid'5000.

Quick start

In this part, we will create a virtual machine in a few steps, and ssh to it.

Job submission

In order to test easily the kvm environment, we use an interactive job, and we reserve one subnet and one node with hardware virtualization capabilities.

Terminal.png frontend:
oarsub -I -l slash_22=1+{"virtual!='NO'"}/nodes=1

Disk image, virtual machine

A disk image containing Debian 11 bullseye is available at the following path: /grid5000/virt-images/debian11-x64-base.qcow2

You can copy it on the node : It will be our base image for our VMs :

Terminal.png node:
cp /grid5000/virt-images/debian11-x64-base.qcow2 /tmp/

If we want to create multiple VMs, we will have to copy the qcow2 as many times as the number of VM we want.
To gain storage space, we can use debian11-x64-base.qcow2 as a backing file :

Terminal.png node:
qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/domain1.qcow2

By doing this, domain1.qcow2 will only store the difference from debian11-x64-base.qcow2 (and not the whole image)
If you want to create a second virtual machine based on the same image, simply run the same command with domain2.qcow instead of domain1.qcow2

Choose a MAC address

As seen before, g5k-subnets maintains a correspondence between MAC addresses and IP addresses. The Debian system provided on the disk image is configured to use DHCP and the DHCP server will assign the IP corresponding to the MAC address of the virtual machine.

Consequently, you have to choose an IP in the range you have reserved, and set the MAC address of the VM to the associated MAC address.

You can get the list of available IP, and an associated unique MAC address with the following command.

Terminal.png node:
g5k-subnets -im
10.172.0.1      00:16:3E:AC:00:01
10.172.0.2      00:16:3E:AC:00:02
10.172.0.3      00:16:3E:AC:00:03
10.172.0.4      00:16:3E:AC:00:04
10.172.0.5      00:16:3E:AC:00:05
10.172.0.6      00:16:3E:AC:00:06
10.172.0.7      00:16:3E:AC:00:07
10.172.0.8      00:16:3E:AC:00:08
10.172.0.9      00:16:3E:AC:00:09
10.172.0.10     00:16:3E:AC:00:0A
...

Run the guest OS using libvirt

Libvirt is a toolkit for managing virtualization servers. Libvirt is also an abstraction layer for different virtualization solutions, including KVM but also Xen and VMWare ESX.

In our case, we use libvirt on top of KVM.

  • Create a domain file in XML, describing a virtual machine.

eg : domain1.xml

<domain type='kvm'>
  <name>domain1</name>
  <memory>2048000</memory>
  <vcpu>1</vcpu>
  <cpu mode='host-model'/>
  <os>
    <type arch="x86_64">hvm</type>
  </os>
  <clock offset="localtime"/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/domain1.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <interface type='bridge'>
      <source bridge='br0'/>
      <mac address='AA:BB:CC:DD:EE:FF'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/ttyS0'/>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <source path='/dev/ttyS0'/>
      <target port='0'/>
    </console>
  </devices>
</domain>
Note.png Note

- Libvirt will create a virtual interface and attach it to the bridge br0 so your VM can reach the rest of Grid'5000 and access internet
- Adapt this file to your case, you must change the "mac address" field with one of the g5k-subnet addresses

Now, we can run and manage our guest OS with virsh.

  • Run the guest with the following command :
Terminal.png node:
virsh create domain1.xml
  • We can see ou guest is currently running :
Terminal.png node:
virsh list
Id    Name                 State
---------------------------------------
1     domain1              running
  • You can connect to your VM console
    • The default root password is grid5000
    • Use CTRL+] to disconnect from virsh console (or CTRL+ALT+6 on macos)
Terminal.png node:
virsh console domain1
Note.png Note

To have a Graphical user interface (GUI) for the VM, you can configure a VNC access. To do so, simply add the following line to domain1.xml (at line 31, after </console> and before </devices>):

<graphics type='vnc' listen='0.0.0.0' autoport='yes' keymap='en-us'/>

Get the VNC port uses by the VM:

Terminal.png node:
virsh vncdisplay domain1

The answer should be something like :PORT, e.g. :0 for the first VNC display opened, or :N for the Nth VNC display opened.

Then, you need to create a SSH tunnel to access the VM from your personal computer.

Terminal.png Your laptop:
ssh -LPORT:NODE:PORT-N SITE.g5k

With:

  • NODE: the node name, e.g. dahu-32
  • PORT: the port given above, e.g. 0
  • SITE: the Grid'5000 site you are using, e.g. grenoble.
Finally, use a VNC client on your personal computer to access the VM. The hostname to use is localhost:PORT .

  • At this point, you can repeat the full process and launch several VMs in parallel.
  • Stop the execution of your VM with:
Terminal.png node:
virsh destroy domain1

Run the guest OS using the qemu/kvm command

You can also use the qemu/kvm command to start the virtual machine
Terminal.png node:
qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2048 -rtc base=localtime -no-acpi -drive file=/tmp/debian11-x64-base.qcow2,cache=unsafe,media=disk,if=virtio,id=drive0 -netdev bridge,id=br0 -device virtio-net-pci,netdev=br0,mac=AA:BB:CC:DD:EE:FF -nographic

Or with a VNC display:

Terminal.png node:
qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2048 -rtc base=localtime -no-acpi -drive file=/tmp/debian11-x64-base.qcow2,cache=unsafe,media=disk,if=virtio,id=drive0 -netdev bridge,id=br0 -device virtio-net-pci,netdev=br0,mac=AA:BB:CC:DD:EE:FF -device cirrus-vga,id=video0 -vnc :0

These are an example commands. Feel free to adapt them to your use case. (The qemu process can be launched in a screen session in order to be detachable. If you are not familiar with screen, read its documentation)

SSH to your virtual machine

Root SSH authentication with password is disabled by default, to SSH to your VM, do the following steps

  1. Log into your VM console using virsh console domain1. The root password is grid5000
  2. Run these command to allow root login with password in ssh config, and reload ssh daemon :
Terminal.png VM:
echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
Terminal.png VM:
systemctl restart ssh.service

Finally, you can ssh directly to your VM from anywhere in Grid'5000:

Terminal.png node:
ssh root@g5k-subnet_ip_addr

Contextualize your VMs with cloud-init

As we have seen, we must use the console of our VM to configure SSH and connect to it later. It's a bit annoying if we have many VMs, we would have to manually configure SSH on each instances within the console.

This part describes how to contextualize your VM using cloud-init.
cloud-init ( https://cloudinit.readthedocs.io/en/latest/ ) runs on startup of the VM and search for a datasource to fetch configurations to apply to the VM, such as :

  • Set the hostname
  • Create users
  • Copy SSH key to root account
  • Mount a device
  • Execute a script
  • ...

This is the contextualization.

On Grid'5000, this datasource is a virtual disk (.iso) that contains the configurations we want.

Create a virtual disk for cloud-init

In this example, we will create a CD containing simple contextualization configuration for cloud-init: It will change the hostname of the VM and add your public SSH key to the root account.

To help you creating cloud-init configuration file, there is a script cloud-init-example.sh you can copy on your node:

Terminal.png node:
cp /grid5000/virt-images/cloud-init-example.sh /tmp/

This script will generate basics configuration files for cloud-init to add your public SSH key to the root account so that you can SSH to the VM without password and without using the console.

Terminal.png node:
cd /tmp && export cloud_init_key=$(cat ~/.ssh/id_rsa.pub) && ./cloud-init-example.sh
Note.png Note

The previous command assume your SSH public key is in ~/.ssh/id_rsa.pub. If not, please put the correct path in the command

You can see 2 files were created in cloud-init-data : meta-data and user-data

  • meta-data contains configuration such as hostname, root SSH key, instance id, ... .You can see the script wrote your SSH public key in this file.
  • user-data can contains more configuration in different format

Now, we can generate an iso file using the following command :

Terminal.png node:/tmp:
genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data

The file cloud-init-data.iso is ready to be attached to a VM.
Cloud-init will detect the disk on startup and configure the virtual machine using the informations in meta-data and user-data on the CD.

Start a VM with contextualization

We will run a new VM with contextualization : First we create a new disk image from our base image :

Terminal.png node:/tmp:
qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/contextualized-domain.qcow2

We create a new contextualized-domain.xml with this content :

<domain type='kvm'>
  <name>contextualized-domain</name>
  <memory>2048000</memory>
  <vcpu>1</vcpu>
  <cpu mode='host-model'/>
  <os>
    <type arch="x86_64">hvm</type>
  </os>
  <clock offset="localtime"/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/contextualized-domain.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <disk type='file' device='cdrom'>
      <source file='/tmp/cloud-init-data.iso'/>
      <target dev='vdb' bus='virtio'/>
      <readonly/>
    </disk>
    <interface type='bridge'>
      <source bridge='br0'/>
      <mac address='AA:BB:CC:DD:EE:FF'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/ttyS0'/>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <source path='/dev/ttyS0'/>
      <target port='0'/>
    </console>
  </devices>
</domain>

You can notice it's the same xml file as in the previous section, except that we added highlighted part for the cdrom device, to mount the iso on the VM as a cdrom.

Note.png Note

- Don't forget to change the MAC address with one of your g5k-subnet mac
- If you want to use the kvm command instead of virsh, add -cdrom /tmp/cloud-init-data.iso option

Start the guest OS and connect it using ssh :

Terminal.png node:
virsh create contextualized-domain.xml
Terminal.png node:
ssh root@g5k-subnet_ip_addr

You can now SSH to your VM without password, and without having to use the VM console. You can notice the hostname also changed to example-vm, as specified in the meta-data file.

Playing with cloud-init

This optional part shows more example of what can be done with cloud-init

Run a script on startup with user-data

The current content of user-data is :

#cloud-config
disable_root: false

By default, cloud-init disable the root account.
If you try to connect as root on the VM without enabling the root account in user-data,
you will get a message saying you need to connect as user debian (your public key will be accepted for user debian). This user has sudo rights

user-data file start with #cloud-config, telling cloud-init that the format of the file is a cloud-config. As we will see in the next part, it's a file in YAML that describes the configurations cloud-init has to apply on boot.

But user-data can also be a bash script, and that's what we will do here. Replace the content of user-data with:

#!/bin/bash
apt-get update && apt-get install -y lighttpd
cat << EOF > /var/www/html/index.lighttpd.html
<!DOCTYPE html>
<html>
  <head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <title>VM HTTP Server</title>
  </head>
  <body>
    <h1> Install and configured with cloud-init </h1>
  </body>
</html>
EOF

Generate the iso file with this new configuration :

Terminal.png node:/tmp:
genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data

We will destroy our running VM, create a fresh disk from debian11-x64-base.qcow2 and restart it :

Terminal.png node:/tmp:
virsh destroy contextualized-domain
Terminal.png node:/tmp:
qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/contextualized-domain.qcow2
Terminal.png node:/tmp:
virsh create contextualized-domain.xml

After a few moment, we can try :

Terminal.png node:/tmp:
curl http://vm_ip

The script in user-data ran on startup. It installed lighttpd, a small http server, and replaced the default index.html

We can still ssh on our VM, but not as root since we remove the option that enabled root account in user-data :

Terminal.png node:/tmp:
ssh debian@vm_ip
Terminal.png debian@vm:
sudo su
Terminal.png root@vm:
#

Going further with user-data in YAML

We have seen how to use user-data as a startup script.
We will now use it in cloud-config format : It's description file in YAML to trigger some actions on startup :

Copy the following content to user-data and insert your public SSH key where needed. Then regenerate the iso file :

#cloud-config
groups:
  - foo
  - bar
users:
  - name: foo
    primary-group: foo
    groups: users
    shell: /bin/bash
    ssh-authorized-keys:
      - <insert your public key here>
  - name: bar
    primary-group: bar
    groups: users
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    ssh-authorized-keys:
      - <insert your public key here>
packages:
  - lighttpd
Terminal.png node:/tmp:
genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data

The advantage of using this format is the readability. We can quickly identify what it will do :

  • Create a user foo in group foo (with no sudo right)
  • Create a user bar in group bar with sudo rights
  • Install the package lighttpd

You can create a new VM to test this new configuration :

Terminal.png node:/tmp:
virsh destroy contextualized-domain
Terminal.png node:/tmp:
qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/contextualized-domain.qcow2
Terminal.png node:/tmp:
virsh create contextualized-domain.xml

You'll be able to connect without password as foo and bar

Terminal.png node:/tmp:
ssh bar@vm_ip

For more example of what can be done with cloud-config : http://cloudinit.readthedocs.io/en/latest/topics/examples.html

Add cloud-init to you own image

You may want to add cloud-init to your own virtual image so you can run custom VMs on Grid'5000.

To do so, simply run

Terminal.png VM:
apt-get update && apt-get install cloud-init

in your VM before exporting it as qcow2.

You can also use virt-customize for an already existing qcow2 :

Terminal.png local_pc:
apt-get update && apt-get libguestfs-tools
Terminal.png local_pc:
virt-customize -a my_image.qcow2 --install cloud-init

Multi-site experiment

In this part, to illustrate what can be done using Virtual machines on the standard environment, we will start two virtual machines on two sites, and make them communicate using the virtualization network.

Reservation

Open 2 terminals, and ssh to the frontends of 2 sites, in this example, it will be the frontend of Luxembourg, and the frontend of Nancy. Then, reserve two virtualization-capable nodes and two subnets on two different sites.

For the rest of the multi-site experiment part, don't forget to run each command on both sites.

Terminal.png frontends(both):
oarsub -I -l slash_22=1+{"virtual!='NO'"}/nodes=1

Network configuration

In this part, we will choose an IP for the 2 virtual machines.

Choose a couple of IP & MAC for each VM, in the output of g5k-subnets -im. Note that g5k-subnets returns completely different information on each site. In the following, we assume that you chose 10.144.8.1 (00:16:3e:90:08:01) in Nancy, and 10.172.0.1 (00:16:3e:ac:00:01) in Luxembourg.

Terminal.png nodes(both):
g5k-subnets -im | head

Instantiate your VMs

Copy a standard virtual machine image

Copy the default virtual machine image from /grid5000/virt-images/debian11-x64-base.qcow2 to /tmp on both nodes :

Terminal.png nodes(both):
cp /grid5000/virt-images/debian11-x64-base.qcow2 /tmp/

Configure cloud-init

To be able to SSH without password, we will use cloud-init :

Terminal.png node(both):
cp /grid5000/virt-images/cloud-init-example.sh /tmp/
Terminal.png node(both):
cd /tmp && export cloud_init_key=$(cat ~/.ssh/id_rsa.pub) && ./cloud-init-example.sh
Terminal.png node(both):
genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data

Create the domain.xml file

The domain.xml file contains the description of your virtual machine. Create it on both side and adapt it to use a mac address provided by g5k-subnets -im. The virtual machine will get the IP associated to its mac address :

<domain type='kvm'>
  <name>buster</name>
  <memory>2048000</memory>
  <vcpu>1</vcpu>
  <cpu mode='host-model'/>
  <os>
    <type arch="x86_64">hvm</type>
  </os>
  <clock offset="localtime"/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/debian11-x64-base.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <disk type='file' device='cdrom'>
      <source file='/tmp/cloud-init-data.iso'/>
      <target dev='vdb' bus='virtio'/>
      <readonly/>
    </disk>
    <interface type='bridge'>
      <source bridge='br0'/>
      <mac address='AA:BB:CC:DD:EE:FF'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/ttyS0'/>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <source path='/dev/ttyS0'/>
      <target port='0'/>
    </console>
  </devices>
</domain>

Launch the two VMs

Terminal.png nodes(both):
virsh create domain.xml

Enjoy !

SSH in your VMs

Terminal.png node(nancy):
ssh root@10.144.8.1
Terminal.png node(luxembourg):
ssh root@10.172.0.1

Install and run iperf

Finally, we will install iperf and measure the bandwidth between the two VMs:

  • install iperf with apt-get ;
  • then, run iperf in server mode (-s parameter) on one node, and in client mode (-c parameter) on the other.
Terminal.png vm(nancy):
apt-get update && apt-get install iperf
Terminal.png vm(nancy):
iperf -s
root@vm-1:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.144.8.1 port 5001 connected with 10.172.0.1 port 52389
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.09 GBytes    938 Mbits/sec
Terminal.png vm(luxembourg):
apt-get update && apt-get install iperf
Terminal.png vm(luxembourg):
iperf -c 10.144.8.1
root@vm-1:~# iperf -c 10.144.8.1
------------------------------------------------------------
Client connecting to 10.144.8.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.172.0.1 port 52389 connected with 10.144.8.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes    938 Mbits/sec

SR-IOV and Virtual Function PCI passthrough

With SR-IOV, modern network interface cards can create "Virtual Functions": they are basically virtual network interfaces that appear as separate PCI devices.

An interesting usage is to allocate each virtual function to a virtual machine, using PCI passthrough. This way, the virtual machine has direct access to the hardware for maximum performance, while the physical interface can still be shared between many different virtual machines.

Support for SR-IOV on Grid'5000

As of June 2021, SR-IOV is enabled on the BIOS of most clusters.

However, SR-IOV must also be enabled at all layers of the stack: network interface firmware, Linux kernel, driver. An overview of which clusters have effective SR-IOV support can be seen at: Hardware#Nodes_with_SR-IOV_support. This table also lists the maximum number of VFs that can be enabled on each interface.

Support for Mellanox Ethernet cards is planned. Support for Intel i40e cards is currently not possible (bug #12695).

Contact us if you have requirements that are not covered by the current state.

Using SR-IOV on Grid'5000

As this is an advanced virtualisation usage, the Grid'5000 team cannot maintain a detailed tutorial. There are good SR-IOV resources on the Internet.

However, here is an overview of the required steps on Grid'5000:

  • You must deploy an environment to have full control on the environment, such as the kernel command line. Using debian11-x64-big is a good start so that qemu is already installed.
  • To enable PCI passthrough, you will probably need to pass parameters to the kernel command line: see CPU_parameters#Setting_kernel_parameters_with_Kadeploy
  • Once booted, create virtual functions with echo X > /sys/class/net/$IFACE/device/sriov_numvfs
  • Choose a MAC address in your subnet reservation (g5k-subnets -im from your reservation on the frontend)
  • Change the MAC address of your Virtual Function interface (e.g. ip link set eno1v0 address AA:BB:CC:DD:EE:FF)
  • Identify the PCI address of your Virtual Function interface (for instance with lshw -c network -businfo)
  • Configure libvirt or qemu with PCI passthrough so that the virtual machine takes ownership of the Virtual Function

Alternative: running qemu in user space using the Grid5000 create_tap script

Thanks to the create_tap script, it is possible as a normal user to create a bridged VM using qemu, without requiring the usage of sudo-g5k.

Connect to a frontend and reserve a node and a dedicated network for the VM:

ssh nancy.g5k 
oarsub -I -l slash_22=1+{"virtual!='NO'"}/nodes=1,walltime=2:00:00
oarsub -C

Create a minimalistic image file:

qemu-img create -f qcow2 test.img 1G

Call create_tap to create a tap0 device:

sudo create_tap

NB: do not use sudo-g5k but sudo instead as create_tap is a sudoable command granted for all users.

Launch the virtual machine:

qemu-system-x86_64 -nographic -m 1G -hda test.img -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,disable-modern=off,netdev=mynet0,mac=$(g5k-subnets -i -m |head -1|awk '{print $2}')

Press Ctrl-B quickly to get access the iPXE command line, then call the following commands to get and check the expected IP:

dhcp net0
show net0/ip

To terminate the qemu session, simply use the Ctrl-A X keystroke.