KaVLAN
Overview
The goal of Kavlan is to allow people to manage VLAN on Grid'5000 nodes. The benefits is complete level 2 isolation. It can be used together with OAR and Kadeploy to do some experimentations on the grid.
The following figure shows two jobs running with KaVLAN: each job has it's nodes isolated in a VLAN (purple and green). The other nodes are all in the default VLAN (red). The only way to reach the isolated nodes is to use a gateway node (kavlan-1 and kavlan-2 in the figure). The gateway has two Ethernet interfaces: one in the default VLAN and one is the dedicated VLAN. This way, you can use ssh to reach your nodes (an other way to reach an isolated node is to use the kaconsole
command).
Currently, KaVLAN can be used on a single site only. The Technical team is currently developing an extension to use QinQ in Grid'5000 to allow Grid-wide VLANs.
Installation status on sites :
Sites | Version | Status |
---|---|---|
Bordeaux | ||
Grenoble | ||
Lille | 1.0rc3 | |
Lyon | ||
Nancy | ||
Orsay | ||
Rennes | ||
Sophia | 1.0rc5 | |
Toulouse |
Usage
How to reserve a VLAN
KaVLAN only works with deploy reservations; to obtain nodes and a VLAN, simply add the -t kavlan option to oarsub
. For example, if you need 3 nodes and a VLAN:
Then you can get the id of your VLAN using the kavlan
command
If you run this command outside the shell started by OAR for your reservation, you must add the oar JOBID.
You should get an integer in the <1-8> range.
You can get all the options of the command using --help:
# kavlan --help Version 1.0rc2 USAGE : kavlan [options] -r|--get-network-range -g|--get-network-gateway -l|--get-nodelist -V|--get-vlan-id print VLAN ID of job (needs -j JOBID) -d|--disable-dhcp -e|--enable-dhcp -i|--vlan_id <VLANID> -s set vlan for given node(s) -f|--filenode <NODEFILE> -j|--oar-jobid=<JOBID> -m|--machine <nodename> -q|--quiet quiet mode -h|--help print this help -v|--verbose verbose mode
Once you have a kavlan reservation running, you are allowed to connect to the VLAN gateway named kavlan-<ID>
where ID is your vlan ID, and you can also put your nodes in your VLAN (and back into the default VLAN) at anytime during the lifetime of your job.
Since KaVLAN works only with deploy jobs, the next step is to deploy at least one node (otherwise, you won't have root acces on it and therefore can't restart it's network configuration).
Let's say you want to deploy all nodes using the lenny-x64-base environment:
Enable/disable the dhcp server of the gateway
Once the deployment is over, you are now able to change the VLAN of your nodes. First check that the DHCP server is running on the gateway, run on the frontend (add -j JOBID if needed) :
You can disable the DHCP server with kavlan -d
Change the VLAN of your nodes
In order to change the VLAN of the nodes, you must reconfigure the network after the vlan has changed; but once the VLAN has changed, you can't connect to the node! An easy way to do this is to use the 'at' command (apt-get install at
if it's not installed in your nodes)
We will use Taktuk to start remote commands on several nodes at once. In this example, we will use all the nodes. Since taktuk does not handle duplicate names in the nodefile, we must first remove duplicates.
First, we will use taktuk to install at
on all nodes, then the taktuk command will simply launch the network reconfiguration in one minute. Finally, we set the VLAN of all our nodes.
$ uniq $OAR_NODEFILE > ./mynodes $ taktuk -s -l root -f ./mynodes broadcast exec [ "apt-get update; apt-get --yes install at" ] $ taktuk -s -l root -f ./mynodes broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ] $ kavlan -s Take node list from OAR nodefile: /var/lib/oar/387465 ... node azur-25.sophia.grid5000.fr changed to vlan KAVLAN-7 ... node azur-28.sophia.grid5000.fr changed to vlan KAVLAN-7 ... node azur-30.sophia.grid5000.fr changed to vlan KAVLAN-7 all nodes are configured in the vlan 7
In one minute, your nodes will renegotiate their IP addresses and will be available inside the VLAN. To get the name of your nodes in the VLAN, use the -l option:
$kavlan -l azur-25-kavlan-7.sophia.grid5000.fr azur-28-kavlan-7.sophia.grid5000.fr azur-30-kavlan-7.sophia.grid5000.fr
You can connect to each of them using kaconsole or ssh (first, you must connect to the gateway of the vlan):
$VLANID=`kavlan -V` $ssh kavlan-$VLANID kavlan-7@sophia$ ssh root@azur-25-kavlan-7
You can use the ip neigh
command to see the known hosts in your LAN; you should only see IPs in the 192.168.66.0/24 subnet
azur-25-kavlan-7:~$ip neigh 192.168.66.250 dev eth0 INCOMPLETE 192.168.66.254 dev eth0 lladdr 02:00:00:00:01:02 REACHABLE
You should be able to ping another of your host inside your VLAN
azur-25-kavlan-7:~# ping -c 3 azur-30-kavlan-7 PING azur-30-kavlan-7.sophia.grid5000.fr (192.168.66.30) 56(84) bytes of data. 64 bytes from azur-30.local (192.168.66.30): icmp_seq=1 ttl=64 time=0.154 ms 64 bytes from azur-30.local (192.168.66.30): icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from azur-30.local (192.168.66.30): icmp_seq=3 ttl=64 time=0.163 ms --- azur-30-kavlan-7.sophia.grid5000.fr ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1998ms rtt min/avg/max/mdev = 0.154/0.162/0.170/0.012 ms
You can configure ssh to make the connection through the gateway transparent:
Configure ssh to easily connect to nodes in a VLAN
In order to transparently use ssh to acces to isolated nodes, you should add this to your .ssh/config file on the frontend:
Host *-*-kavlan-1 *-*-kavlan-1.*.grid5000.fr ProxyCommand ssh -q -a -x kavlan-1 nc %h %p Host *-*-kavlan-2 *-*-kavlan-2.*.grid5000.fr ProxyCommand ssh -q -a -x kavlan-2 nc %h %p Host *-*-kavlan-3 *-*-kavlan-3.*.grid5000.fr ProxyCommand ssh -q -a -x kavlan-3 nc %h %p Host *-*-kavlan-4 *-*-kavlan-4.*.grid5000.fr ProxyCommand ssh -q -a -x kavlan-4 nc %h %p Host *-*-kavlan-5 *-*-kavlan-5.*.grid5000.fr ProxyCommand ssh -q -a -x kavlan-5 nc %h %p Host *-*-kavlan-6 *-*-kavlan-6.*.grid5000.fr ProxyCommand ssh -q -a -x kavlan-6 nc %h %p Host *-*-kavlan-7 *-*-kavlan-7.*.grid5000.fr ProxyCommand ssh -q -a -x kavlan-7 nc %h %p Host *-*-kavlan-8 *-*-kavlan-8.*.grid5000.fr ProxyCommand ssh -q -a -x kavlan-8 nc %h %p
Then you can simply use ssh <cluster>-<nodeid>-kavlan-<vlanid> to access the node , for ex:
Put your nodes back into the default VLAN
First, get put the list of your nodes name with vlan in a file:
Don't forget to first start the network restarting command with taktuk:
frontend :
|
taktuk -s -l root -f ./mynodes-vlan broadcast exec [ "echo '/etc/init.d/networking restart' | at now + 1 minute " ] |
Then you can put your nodes back in the default VLAN:
You should be able to ping your nodes:
for i in `uniq $OAR_NODEFILE`; do ping -c 1 $i; done PING azur-25.sophia.grid5000.fr (138.96.20.25 56(84) bytes of data. 64 bytes from azur-25.sophia.grid5000.fr (138.96.20.25): icmp_seq=1 ttl=64 time=1002 ms --- azur-25.sophia.grid5000.fr ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1002.910/1002.910/1002.910/0.000 ms PING azur-28.sophia.grid5000.fr (138.96.20.28) 56(84) bytes of data. 64 bytes from azur-28.sophia.grid5000.fr (138.96.20.28): icmp_seq=1 ttl=64 time=1.23 ms --- azur-28.sophia.grid5000.fr ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.234/1.234/1.234/0.000 ms PING azur-30.sophia.grid5000.fr (138.96.20.30) 56(84) bytes of data. 64 bytes from azur-30.sophia.grid5000.fr (138.96.20.30): icmp_seq=1 ttl=64 time=1.25 ms --- azur-30.sophia.grid5000.fr ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.259/1.259/1.259/0.000 ms
Advance usage
Setup a DHCP server on your nodes
If you need to run your own DHCP server (for example if you want to run a cluster distribution inside kavlan or test kadeploy ), you can use the configuration file available on the VLAN's gateway.
Let's say that you want to install dhcpd on azur-25-kavlan-7. You first have to install a dhcp server on this node (we assume the node is not yet is the job VLAN):
Then, copy the configuration file from the gateway to the node:
Then we must isolate our nodes before starting the dhcp server:
frontend$ taktuk -s -l root -f ./mynodes broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ] frontend$ kavlan -s
Wait one minute, and then you can start the server, once you have disabled the gateway's DHCP server.
On the frontend
then on the node:
Then, in another shell, connect as root on a second node:
And restart the network configuration:
azur-30-kavlan-7:~# /etc/init.d/networking restart Reconfiguring network interfaces...There is already a pid file /var/run/dhclient.eth1.pid with pid 5319 killed old client process, removed PID file Internet Systems Consortium DHCP Client V3.1.1 Copyright 2004-2008 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ Listening on LPF/eth1/00:11:25:c4:d9:c5 Sending on LPF/eth1/00:11:25:c4:d9:c5 Sending on Socket/fallback DHCPRELEASE on eth1 to 192.168.66.254 port 67 Internet Systems Consortium DHCP Client V3.1.1 Copyright 2004-2008 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ Listening on LPF/eth1/00:11:25:c4:d9:c5 Sending on LPF/eth1/00:11:25:c4:d9:c5 Sending on Socket/fallback DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8 DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 14 DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 4 DHCPOFFER from 192.168.66.19 DHCPREQUEST on eth1 to 255.255.255.255 port 67 DHCPACK from 192.168.66.19 bound to 192.168.66.2 -- renewal in 41122 seconds. done.
on the dhcp server, check the logs:
azur-25-kavlan-7:~# tail /var/log/messages Mar 17 16:22:51 azur-25 dhcpd: Copyright 2004-2008 Internet Systems Consortium. Mar 17 16:22:51 azur-25 dhcpd: All rights reserved. Mar 17 16:22:51 azur-25 dhcpd: For info, please visit http://www.isc.org/sw/dhcp/ Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 deleted host decls to leases file. Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 new dynamic host decls to leases file. Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 leases to leases file. Mar 17 16:25:27 azur-25 dhcpd: DHCPDISCOVER from 00:11:25:c4:d9:c5 via eth1 Mar 17 16:25:27 azur-25 dhcpd: DHCPOFFER on 192.168.66.2 to 00:11:25:c4:d9:c5 via eth1 Mar 17 16:25:27 azur-25 dhcpd: DHCPREQUEST for 192.168.66.2 (192.168.66.19) from 00:11:25:c4:d9:c5 via eth1 Mar 17 16:25:27 azur-25 dhcpd: DHCPACK on 192.168.66.2 to 00:11:25:c4:d9:c5 via eth1
In the four last lines, you see that your own dhcp server has given an address to the other node.
DHCP + PXE
If you need to do PXE boot, you must change the tftp server in the dhcpd configuration file:
(if there is no next-server configured, you must edit the file by hand and add you a line like this:
next-server XX.XX.XX.XX ;
where XX.XX.XX.XX is the IP of your node (echo $IP).