Virtualization in Grid'5000
Note | |
---|---|
This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team. |
Purpose
This page presents how to use KVM on the standard environment (with a "non-deploy" reservation). The aim is to permit the execution of virtual machines on the nodes, along with a subnet reservation, which will give you a range of routed IP for your experiment.
In the first part, you will learn the basics of g5k-subnets, which is a prerequisite for the rest of this tutorial. The Quick start explains how to run a VM on the standard environment in the minimal number of steps. The next part is optional, it explains in details the contextualization mechanism, which allows you to customize your virtual machines. In the Multi-site experiment section, we will deploy 2 VMs on 2 sites, and we will measure the network bandwidth between them with iperf.
Prerequisite: Network subnets reservation with g5k-subnets
Users deploying VMs on Grid'5000 need to attribute IP address to them. Each site of Grid'5000 is allocated a /14 block for this purpose, divided in 4 smaller blocks.
OAR can be used to reserve a range of IPs. OAR permits to share the IP resources among users, and avoid the potential IP conflicts at the same time.
More details in the Subnet reservation page.
Reservation
A subnet reservation is done similarly to a compute resource reservation, in OAR.
To reserve 4 /22 subnets and 2 nodes, just type:
You can of course have more complex request. To obtain 4 /22 on different /19 subnets, you can type:
To request a node from a specific cluster, advanced OAR usage is needed:
Usage
The simplest way to get the list of your allocated subnets is to use the g5k-subnets
script provided on the head node of the submission.
# g5k-subnets 10.8.0.0 10.8.8.0
Several other printing options are available (-p
option to display the CIDR format, -b
to display broadcast address, -n
to see the netmask, and -a
is equivalent to -bnp
):
# g5k-subnets -a 10.8.0.0/21 10.11.255.255 255.255.252.0 10.11.255.254 10.8.8.0/21 10.11.255.255 255.255.252.0 10.11.255.254
You can also summarize the subnets into a larger one if they are contiguous:
# g5k-subnets -sp 10.8.0.0/20
You can display all the available IP in your reservation, and their associated unique mac addresses, with the following command.
# g5k-subnets -im 10.158.16.1 00:16:3E:9E:10:01 ...
Note | |
---|---|
For detailed information, see the Subnet reservation page. The Grid5000:Network page also describes our organization of the virtual IP space inside Grid'5000. |
Quick start
In this part, we will create a virtual machine in a few steps, and ssh to it.
Job submission
In order to test easily the kvm environment, we use an interactive job, and we reserve one subnet and one node with hardware virtualization capabilities.
Disk image, virtual machine
A disk image containing Debian 11 bullseye is available at the following path:
/grid5000/virt-images/debian11-x64-base.qcow2
You can copy it on the node : It will be our base image for our VMs :
If we want to create multiple VMs, we will have to copy the qcow2 as many times as the number of VM we want.
To gain storage space, we can use debian11-x64-base.qcow2
as a backing file :
By doing this, domain1.qcow2 will only store the difference from debian11-x64-base.qcow2 (and not the whole image)
If you want to create a second virtual machine based on the same image, simply run the same command with domain2.qcow
instead of domain1.qcow2
Choose a MAC address
As seen before, g5k-subnets maintains a correspondence between MAC addresses and IP addresses. The Debian system provided on the disk image is configured to use DHCP and the DHCP server will assign the IP corresponding to the MAC address of the virtual machine.
Consequently, you have to choose an IP in the range you have reserved, and set the MAC address of the VM to the associated MAC address.
You can get the list of available IP, and an associated unique MAC address with the following command.
10.172.0.1 00:16:3E:AC:00:01 10.172.0.2 00:16:3E:AC:00:02 10.172.0.3 00:16:3E:AC:00:03 10.172.0.4 00:16:3E:AC:00:04 10.172.0.5 00:16:3E:AC:00:05 10.172.0.6 00:16:3E:AC:00:06 10.172.0.7 00:16:3E:AC:00:07 10.172.0.8 00:16:3E:AC:00:08 10.172.0.9 00:16:3E:AC:00:09 10.172.0.10 00:16:3E:AC:00:0A ...
Run the guest OS using libvirt
Libvirt is a toolkit for managing virtualization servers. Libvirt is also an abstraction layer for different virtualization solutions, including KVM but also Xen and VMWare ESX.
In our case, we use libvirt on top of KVM.
- Create a domain file in XML, describing a virtual machine.
eg : domain1.xml
<domain type='kvm'>
<name>domain1</name>
<memory>2048000</memory>
<vcpu>1</vcpu>
<cpu mode='host-model'/>
<os>
<type arch="x86_64">hvm</type>
</os>
<clock offset="localtime"/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/domain1.qcow2'/>
<target dev='vda' bus='virtio'/>
</disk>
<interface type='bridge'>
<source bridge='br0'/>
<mac address='AA:BB:CC:DD:EE:FF'/>
</interface>
<serial type='pty'>
<source path='/dev/ttyS0'/>
<target port='0'/>
</serial>
<console type='pty'>
<source path='/dev/ttyS0'/>
<target port='0'/>
</console>
</devices>
</domain>
Now, we can run and manage our guest OS with virsh
.
- Run the guest with the following command :
- We can see ou guest is currently running :
Id Name State --------------------------------------- 1 domain1 running
- You can connect to your VM console
- The default root password is
grid5000
- Use
CTRL+]
to disconnect fromvirsh console
(orCTRL+ALT+6
on macos)
- The default root password is
Note | |
---|---|
To have a Graphical user interface (GUI) for the VM, you can configure a VNC access. To do so, simply add the following line to <graphics type='vnc' listen='0.0.0.0' autoport='yes' keymap='en-us'/>
Get the VNC port uses by the VM: The answer should be something like Then, you need to create a SSH tunnel to access the VM from your personal computer. With:
localhost: PORT .
|
- At this point, you can repeat the full process and launch several VMs in parallel.
- Stop the execution of your VM with:
Run the guest OS using the qemu/kvm command
- You can also use the qemu/kvm command to start the virtual machine
Or with a VNC display:
These are an example commands. Feel free to adapt them to your use case. (The qemu process can be launched in a screen
session in order to be detachable. If you are not familiar with screen, read its documentation)
SSH to your virtual machine
Root SSH authentication with password is disabled by default, to SSH to your VM, do the following steps
- Log into your VM console using
virsh console domain1
. The root password isgrid5000
- Run these command to allow root login with password in ssh config, and reload ssh daemon :
Finally, you can ssh directly to your VM from anywhere in Grid'5000:
Contextualize your VMs with cloud-init
As we have seen, we must use the console of our VM to configure SSH and connect to it later. It's a bit annoying if we have many VMs, we would have to manually configure SSH on each instances within the console.
This part describes how to contextualize your VM using cloud-init.
cloud-init ( https://cloudinit.readthedocs.io/en/latest/ ) runs on startup of the VM and search for a datasource to fetch configurations to apply to the VM, such as :
- Set the hostname
- Create users
- Copy SSH key to root account
- Mount a device
- Execute a script
- ...
This is the contextualization.
On Grid'5000, this datasource is a virtual disk (.iso) that contains the configurations we want.
Create a virtual disk for cloud-init
In this example, we will create a CD containing simple contextualization configuration for cloud-init: It will change the hostname of the VM and add your public SSH key to the root account.
To help you creating cloud-init configuration file, there is a script cloud-init-example.sh
you can copy on your node:
This script will generate basics configuration files for cloud-init to add your public SSH key to the root account so that you can SSH to the VM without password and without using the console.
Note | |
---|---|
The previous command assume your SSH public key is in |
You can see 2 files were created in cloud-init-data : meta-data and user-data
- meta-data contains configuration such as hostname, root SSH key, instance id, ... .You can see the script wrote your SSH public key in this file.
- user-data can contains more configuration in different format
- It can be a bash script that will be executed on startup
- It can be a file in YAML that describes configuration like creating users, mounting a device, running puppet, changing the resolv.conf, ... ( For other examples : https://cloudinit.readthedocs.io/en/latest/topics/examples.html# )
Now, we can generate an iso file using the following command :
node:/tmp :
|
genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data |
The file cloud-init-data.iso
is ready to be attached to a VM.
Cloud-init will detect the disk on startup and configure the virtual machine using the informations in meta-data and user-data on the CD.
Start a VM with contextualization
We will run a new VM with contextualization : First we create a new disk image from our base image :
node:/tmp :
|
qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/contextualized-domain.qcow2 |
We create a new contextualized-domain.xml
with this content :
<domain type='kvm'>
<name>contextualized-domain</name>
<memory>2048000</memory>
<vcpu>1</vcpu>
<cpu mode='host-model'/>
<os>
<type arch="x86_64">hvm</type>
</os>
<clock offset="localtime"/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/contextualized-domain.qcow2'/>
<target dev='vda' bus='virtio'/>
</disk>
<disk type='file' device='cdrom'>
<source file='/tmp/cloud-init-data.iso'/>
<target dev='vdb' bus='virtio'/>
<readonly/>
</disk>
<interface type='bridge'>
<source bridge='br0'/>
<mac address='AA:BB:CC:DD:EE:FF'/>
</interface>
<serial type='pty'>
<source path='/dev/ttyS0'/>
<target port='0'/>
</serial>
<console type='pty'>
<source path='/dev/ttyS0'/>
<target port='0'/>
</console>
</devices>
</domain>
You can notice it's the same xml file as in the previous section, except that we added highlighted part for the cdrom device, to mount the iso on the VM as a cdrom.
Note | |
---|---|
- Don't forget to change the MAC address with one of your g5k-subnet mac |
Start the guest OS and connect it using ssh :
You can now SSH to your VM without password, and without having to use the VM console. You can notice the hostname also changed to example-vm, as specified in the meta-data file.
Playing with cloud-init
This optional part shows more example of what can be done with cloud-init
Run a script on startup with user-data
The current content of user-data is :
#cloud-config disable_root: false
By default, cloud-init disable the root account.
If you try to connect as root on the VM without enabling the root account in user-data,
you will get a message saying you need to connect as user debian (your public key will be accepted for user debian).
This user has sudo rights
user-data file start with #cloud-config, telling cloud-init that the format of the file is a cloud-config. As we will see in the next part, it's a file in YAML that describes the configurations cloud-init has to apply on boot.
But user-data can also be a bash script, and that's what we will do here. Replace the content of user-data with:
#!/bin/bash
apt-get update && apt-get install -y lighttpd
cat << EOF > /var/www/html/index.lighttpd.html
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>VM HTTP Server</title>
</head>
<body>
<h1> Install and configured with cloud-init </h1>
</body>
</html>
EOF
Generate the iso file with this new configuration :
node:/tmp :
|
genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data |
We will destroy our running VM, create a fresh disk from debian11-x64-base.qcow2 and restart it :
node:/tmp :
|
qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/contextualized-domain.qcow2 |
After a few moment, we can try :
The script in user-data ran on startup. It installed lighttpd, a small http server, and replaced the default index.html
We can still ssh on our VM, but not as root since we remove the option that enabled root account in user-data :
Going further with user-data in YAML
We have seen how to use user-data as a startup script.
We will now use it in cloud-config format : It's description file in YAML to trigger some actions on startup :
Copy the following content to user-data and insert your public SSH key where needed. Then regenerate the iso file :
#cloud-config
groups:
- foo
- bar
users:
- name: foo
primary-group: foo
groups: users
shell: /bin/bash
ssh-authorized-keys:
- <insert your public key here>
- name: bar
primary-group: bar
groups: users
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh-authorized-keys:
- <insert your public key here>
packages:
- lighttpd
node:/tmp :
|
genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data |
The advantage of using this format is the readability. We can quickly identify what it will do :
- Create a user foo in group foo (with no sudo right)
- Create a user bar in group bar with sudo rights
- Install the package lighttpd
You can create a new VM to test this new configuration :
node:/tmp :
|
qemu-img create -f qcow2 -o backing_file=/tmp/debian11-x64-base.qcow2 /tmp/contextualized-domain.qcow2 |
You'll be able to connect without password as foo and bar
For more example of what can be done with cloud-config : http://cloudinit.readthedocs.io/en/latest/topics/examples.html
Add cloud-init to you own image
You may want to add cloud-init to your own virtual image so you can run custom VMs on Grid'5000.
To do so, simply run
in your VM before exporting it as qcow2.
You can also use virt-customize for an already existing qcow2 :
Multi-site experiment
In this part, to illustrate what can be done using Virtual machines on the standard environment, we will start two virtual machines on two sites, and make them communicate using the virtualization network.
Reservation
Open 2 terminals, and ssh to the frontends of 2 sites, in this example, it will be the frontend of Luxembourg, and the frontend of Nancy. Then, reserve two virtualization-capable nodes and two subnets on two different sites.
For the rest of the multi-site experiment part, don't forget to run each command on both sites.
Network configuration
In this part, we will choose an IP for the 2 virtual machines.
Choose a couple of IP & MAC for each VM, in the output of g5k-subnets -im
.
Note that g5k-subnets
returns completely different information on each site. In the following, we assume that you chose 10.144.8.1 (00:16:3e:90:08:01) in Nancy, and 10.172.0.1 (00:16:3e:ac:00:01) in Luxembourg.
Instantiate your VMs
Copy a standard virtual machine image
Copy the default virtual machine image from /grid5000/virt-images/debian11-x64-base.qcow2
to /tmp
on both nodes :
Configure cloud-init
To be able to SSH without password, we will use cloud-init :
node(both) :
|
genisoimage -output cloud-init-data.iso -volid cidata -joliet -rock cloud-init-data/user-data cloud-init-data/meta-data |
Create the domain.xml
file
The domain.xml
file contains the description of your virtual machine.
Create it on both side and adapt it to use a mac address provided by g5k-subnets -im
. The virtual machine will get the IP associated to its mac address :
<domain type='kvm'>
<name>buster</name>
<memory>2048000</memory>
<vcpu>1</vcpu>
<cpu mode='host-model'/>
<os>
<type arch="x86_64">hvm</type>
</os>
<clock offset="localtime"/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/debian11-x64-base.qcow2'/>
<target dev='vda' bus='virtio'/>
</disk>
<disk type='file' device='cdrom'>
<source file='/tmp/cloud-init-data.iso'/>
<target dev='vdb' bus='virtio'/>
<readonly/>
</disk>
<interface type='bridge'>
<source bridge='br0'/>
<mac address='AA:BB:CC:DD:EE:FF'/>
</interface>
<serial type='pty'>
<source path='/dev/ttyS0'/>
<target port='0'/>
</serial>
<console type='pty'>
<source path='/dev/ttyS0'/>
<target port='0'/>
</console>
</devices>
</domain>
Launch the two VMs
Enjoy !
SSH in your VMs
Install and run iperf
Finally, we will install iperf
and measure the bandwidth between the two VMs:
- install
iperf
withapt-get
; - then, run
iperf
in server mode (-s
parameter) on one node, and in client mode (-c
parameter) on the other.
root@vm-1:~# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 10.144.8.1 port 5001 connected with 10.172.0.1 port 52389 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 1.09 GBytes 938 Mbits/sec |
root@vm-1:~# iperf -c 10.144.8.1 ------------------------------------------------------------ Client connecting to 10.144.8.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.172.0.1 port 52389 connected with 10.144.8.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.09 GBytes 938 Mbits/sec |
SR-IOV and Virtual Function PCI passthrough
With SR-IOV, modern network interface cards can create "Virtual Functions": they are basically virtual network interfaces that appear as separate PCI devices.
An interesting usage is to allocate each virtual function to a virtual machine, using PCI passthrough. This way, the virtual machine has direct access to the hardware for maximum performance, while the physical interface can still be shared between many different virtual machines.
Support for SR-IOV on Grid'5000
As of June 2021, SR-IOV is enabled on the BIOS of most clusters.
However, SR-IOV must also be enabled at all layers of the stack: network interface firmware, Linux kernel, driver. An overview of which clusters have effective SR-IOV support can be seen at: Hardware#Nodes_with_SR-IOV_support. This table also lists the maximum number of VFs that can be enabled on each interface.
Support for Mellanox Ethernet cards is planned. Support for Intel i40e cards is currently not possible (bug #12695).
Contact us if you have requirements that are not covered by the current state.
Using SR-IOV on Grid'5000
As this is an advanced virtualisation usage, the Grid'5000 team cannot maintain a detailed tutorial. There are good SR-IOV resources on the Internet.
However, here is an overview of the required steps on Grid'5000:
- You must deploy an environment to have full control on the environment, such as the kernel command line. Using
debian11-x64-big
is a good start so that qemu is already installed. - To enable PCI passthrough, you will probably need to pass parameters to the kernel command line: see CPU_parameters#Setting_kernel_parameters_with_Kadeploy
- Once booted, create virtual functions with
echo X > /sys/class/net/$IFACE/device/sriov_numvfs
- Choose a MAC address in your subnet reservation (
g5k-subnets -im
from your reservation on the frontend) - Change the MAC address of your Virtual Function interface (e.g.
ip link set eno1v0 address AA:BB:CC:DD:EE:FF
) - Identify the PCI address of your Virtual Function interface (for instance with
lshw -c network -businfo
) - Configure libvirt or qemu with PCI passthrough so that the virtual machine takes ownership of the Virtual Function
Alternative: running qemu in user space using the Grid5000 create_tap script
Thanks to the create_tap
script, it is possible as a normal user to create a bridged VM using qemu, without requiring the usage of sudo-g5k
.
Connect to a frontend and reserve a node and a dedicated network for the VM:
ssh nancy.g5k oarsub -I -l slash_22=1+{"virtual!='NO'"}/nodes=1,walltime=2:00:00 oarsub -C
Create a minimalistic image file:
qemu-img create -f qcow2 test.img 1G
Call create_tap
to create a tap0
device:
sudo create_tap
NB: do not use sudo-g5k
but sudo
instead as create_tap
is a sudoable command granted for all users.
Launch the virtual machine:
qemu-system-x86_64 -nographic -m 1G -hda test.img -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,disable-modern=off,netdev=mynet0,mac=$(g5k-subnets -i -m |head -1|awk '{print $2}')
Press Ctrl-B quickly to get access the iPXE command line, then call the following commands to get and check the expected IP:
dhcp net0 show net0/ip
To terminate the qemu session, simply use the Ctrl-A X
keystroke.