KaVLAN: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
Line 113: Line 113:
Once you have a kavlan reservation running, you can put your nodes in your VLAN (and back into the default VLAN) at anytime during the lifetime of your job; for local VLAN, you are also allowed to connect to the VLAN gateway named <code class='hostname'>kavlan-<ID></code> where ID is your vlan ID,
Once you have a kavlan reservation running, you can put your nodes in your VLAN (and back into the default VLAN) at anytime during the lifetime of your job; for local VLAN, you are also allowed to connect to the VLAN gateway named <code class='hostname'>kavlan-<ID></code> where ID is your vlan ID,


Since KaVLAN works only with deploy jobs, the next step is to deploy at least one node (otherwise, you won't have root access on it and therefore can't restart it's network configuration).


 
In you want to learn how to use KaVLAN, you can try the tutorial on [[Network isolation on Grid'5000]]
== Enable/disable the dhcp server of the VLAN ==
Once you have reserved a VLAN, you are allowed to start/stop the DHCP server of the VLAN.
Do this with the kavlan command (add ''-j JOBID'' if needed) :
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -e}}
 
(You can disable the DHCP server with <code class='command'>kavlan -d</code>)
 
== Deploy nodes and change VLAN in one step ==
 
Kadeploy is now able to change the VLAN of the nodes it deploys:
Let's say you want to deploy all nodes using the lenny-x64-base environment, in your VLAN (id=4):
{{Term|location=frontend|cmd=<code class="command">kadeploy3</code> -f $OAR_NODEFILE -k -e <code class="replace">lenny-x64-base</code> --vlan <code class="replace">4</code> }}
 
 
 
== Change the VLAN of your nodes manually==
 
If you really want to change the VLAN manually, you can, but it's much simpler to change the vlan with kadeploy.
 
In order to change the VLAN of the nodes manually, you must  reconfigure the network after the vlan has changed; but once the VLAN has changed, you can't connect to the node! An easy way to do this is to use the 'at' command (<code class='command'>apt-get install at</code> if it's not installed in your nodes)
 
We will use [[Using_TakTuk|Taktuk]] to start remote commands on several nodes at once. In this example, we will use all the nodes. Since taktuk does not handle duplicate names in the nodefile, we must first remove duplicates.
 
First, we will use taktuk to install <code class='command'>at</code> on all nodes, then the taktuk command will simply launch the network reconfiguration in one minute. Finally, we set the VLAN of all our nodes.
 
<pre class="brush: bash">
$ uniq $OAR_NODEFILE > ./mynodes
$ taktuk -s -l root -f ./mynodes broadcast exec [ "apt-get update; apt-get --yes install at" ]
$ taktuk -s -l root -f ./mynodes broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ]
$ kavlan -s
Take node list from OAR nodefile: /var/lib/oar/387465
... node azur-25.sophia.grid5000.fr changed to vlan KAVLAN-7
... node azur-28.sophia.grid5000.fr changed to vlan KAVLAN-7
... node azur-30.sophia.grid5000.fr changed to vlan KAVLAN-7
all nodes are configured in the vlan 7
</pre>
 
In one minute, your nodes will renegotiate their IP addresses and will be available inside the VLAN. To get the name of your nodes in the VLAN, use the ''-l'' option:
<pre class="brush: bash">
$kavlan  -l
azur-25-kavlan-7.sophia.grid5000.fr
azur-28-kavlan-7.sophia.grid5000.fr
azur-30-kavlan-7.sophia.grid5000.fr
</pre>
 
You can connect to each of them using kaconsole or ssh (first, you must connect to the gateway of the vlan):
<pre class="brush: bash">
$VLANID=`kavlan -V`
$ssh kavlan-$VLANID
kavlan-7@sophia$ ssh root@azur-25-kavlan-7
</pre>
 
You can use the <code class='command'>ip neigh</code> command to see the known hosts in your LAN; you should only see IPs in the 192.168.66.0/24 subnet
<pre class="brush: bash">
azur-25-kavlan-7:~$ip neigh
192.168.66.250 dev eth0  INCOMPLETE
192.168.66.254 dev eth0 lladdr 02:00:00:00:01:02 REACHABLE
</pre>
 
You should be able to ping another of your host inside your VLAN
<pre class="brush: bash">
azur-25-kavlan-7:~# ping -c 3 azur-30-kavlan-7
PING azur-30-kavlan-7.sophia.grid5000.fr (192.168.66.30) 56(84) bytes of data.
64 bytes from azur-30.local (192.168.66.30): icmp_seq=1 ttl=64 time=0.154 ms
64 bytes from azur-30.local (192.168.66.30): icmp_seq=2 ttl=64 time=0.170 ms
64 bytes from azur-30.local (192.168.66.30): icmp_seq=3 ttl=64 time=0.163 ms
 
--- azur-30-kavlan-7.sophia.grid5000.fr ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.154/0.162/0.170/0.012 ms
</pre>
 
You can configure ssh to make the connection through the gateway transparent:
 
== Configure ssh to easily connect to nodes in a local VLAN ==
 
In order to transparently use ssh to acces to isolated nodes (local VLAN), you should add this to your .ssh/config file on the frontend:
 
<pre class="brush: bash;">
Host *-*-kavlan-1 *-*-kavlan-1.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-1 nc %h %p
Host *-*-kavlan-2 *-*-kavlan-2.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-2 nc %h %p
Host *-*-kavlan-3 *-*-kavlan-3.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-3 nc %h %p
</pre>
 
Then you can simply use ssh <cluster>-<nodeid>-kavlan-<vlanid> to access the node , for ex:
{{Term|location=frontend|cmd=<code class="command">ssh</code> root@<code class='replace'>NODE</code>-kavlan-<code class='replace'>VLANID</code>}}
 
== Put your nodes back into the default VLAN ==
 
First, get put the list of your nodes name with vlan in a file:
{{Term|location=frontend|cmd=<code class='command'>kavlan</code> -l > mynodes-vlan}}
 
Don't forget to first start the network restarting command with taktuk:
{{Term|location=frontend|cmd=<code class='command'>taktuk</code> -s -l root -f ./mynodes-vlan broadcast exec [ "echo '/etc/init.d/networking restart' &#124;  at now + 1 minute " ]}}
 
Then you can put your nodes back in the default VLAN:
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -s -i DEFAULT -f $OAR_NODEFILE}}
 
You should be able to ping your nodes:
<pre class="brush: bash">
for i in `uniq $OAR_NODEFILE`; do ping -c 1 $i; done
PING azur-25.sophia.grid5000.fr (138.96.20.25 56(84) bytes of data.
64 bytes from azur-25.sophia.grid5000.fr (138.96.20.25): icmp_seq=1 ttl=64 time=1002 ms
 
--- azur-25.sophia.grid5000.fr ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1002.910/1002.910/1002.910/0.000 ms
PING azur-28.sophia.grid5000.fr (138.96.20.28) 56(84) bytes of data.
64 bytes from azur-28.sophia.grid5000.fr (138.96.20.28): icmp_seq=1 ttl=64 time=1.23 ms
 
--- azur-28.sophia.grid5000.fr ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.234/1.234/1.234/0.000 ms
PING azur-30.sophia.grid5000.fr (138.96.20.30) 56(84) bytes of data.
64 bytes from azur-30.sophia.grid5000.fr (138.96.20.30): icmp_seq=1 ttl=64 time=1.25 ms
 
--- azur-30.sophia.grid5000.fr ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.259/1.259/1.259/0.000 ms
</pre>
 
Another way to put back nodes into the default VLAN is to change the vlan and then kareboot the nodes.
 
{{Term|location=frontend|cmd=<code class='command'>kavlan</code> -s -i DEFAULT -f $OAR_NODEFILE}}
{{Term|location=frontend|cmd=<code class='command'>kareboot3</code> -f $OAR_NODEFILE -r simple_reboot}}
 
=Using the API=
 
Kavlan is also available through the API.
 
FIXME
=Advance usage=
== Use a global VLAN and put nodes from several sites in the same VLAN ==
 
First reserve a global vlan on one site (here sophia) and 2 nodes on lille,sophia and lyon:
{{Term|location=frontend|cmd=<code class="command">oargridsub</code> -t deploy -w 2:00:00 sophia:rdef="{\\\\\\\"type='kavlan-global'\\\\\\\"}/vlan=1+/nodes=2",lille:rdef=/nodes=2,lyon:rdef=/nodes=2 > oargrid.out}}
 
 
Get the oargrid Id and Job key from the output of oargridsub:
{{Term|location=frontend|cmd=<code class="command">export</code> OAR_JOB_KEY_FILE=`grep "SSH KEY" oargrid.out &#124; cut -f2 -d: &#124; tr -d " "`}}
{{Term|location=frontend|cmd=<code class="command">export</code> OARGRID_JOB_ID=`grep "Grid reservation id" oargrid.out &#124; cut -f2 -d=`}}
Get the node list using oargridstat:
{{Term|location=frontend|cmd=<code class="command">oargridstat</code> -w -l $OARGRID_JOB_ID  &#124; grep grid> ~/gridnodes}}
 
Then use kadeploy3 to deploy your image on all sites and change the VLAN:
{{Term|location=frontend|cmd=<code class="command">kadeploy3</code> -f gridnodes -a /home/nniclausse/public/openmx.dsc -k --multi-server -o ~/nodes.deployed --vlan <code class="replace">13</code>}}
 
== Setup a DHCP server on your nodes ==
 
If you need to run your own DHCP server (for example if you want to run a cluster distribution inside kavlan or test kadeploy ), you can use a script to generate the configuration file:
 
Let's say that you want to install dhcpd on azur-25-kavlan-7. You first have to install a dhcp server on this node (we assume the node is not yet is the job VLAN):
{{Term|location=node|cmd=<code class="command">apt-get</code> install dhcp3-server}}
 
Then, download a script that will generate your dhcp configuration:
 
{{Term|location=frontend|cmd=<code class="command">wget</code>  http://public.sophia.grid5000.fr/~nniclausse/gen_dhcpd_conf.rb}}
 
(this script use <code>restfully</code> and <code>ruby-ip</code> gems)
 
 
Then, generate the configuration (replace <SITE> and <VLANID> by your current site and vlan id (7 in the example) ) and copy it on the node:
<pre class="brush: bash">
chmod +x ./gen_dhcpd_conf.rb
./gen_dhcpd_conf.rb --site <SITE> --vlan-id <VLANID>
scp dhcpd-kavlan-<VLANID>-<SITE>.conf root@node:/etc/dhcp3/dhcpd.conf}}
</pre>
 
 
Then we must isolate our nodes before starting the dhcp server:
<pre class="brush: bash">
frontend$ taktuk -s -l root -f ./mynodes broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ]
frontend$ kavlan -s
</pre>
Wait one minute, and then you can start the server, once you have disabled the gateway's DHCP server.
On the frontend {{Term|location=frontend|cmd=<code class="command">kavlan -d</code>}}
then on the node choose as a DHCP server:
{{Term|location=node|cmd=<code class='command'>/etc/init.d/dhcp3-server</code> start}}
 
Then, in another shell, connect as root on a second node:
{{Term|location=frontend|cmd=<code class='command'>ssh</code> root@azur-30-kavlan-7}}
 
And restart the network configuration:
<pre class="brush: bash">
azur-30-kavlan-7:~# /etc/init.d/networking restart
Reconfiguring network interfaces...There is already a pid file /var/run/dhclient.eth1.pid with pid 5319
killed old client process, removed PID file
Internet Systems Consortium DHCP Client V3.1.1
Copyright 2004-2008 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
 
Listening on LPF/eth1/00:11:25:c4:d9:c5
Sending on  LPF/eth1/00:11:25:c4:d9:c5
Sending on  Socket/fallback
DHCPRELEASE on eth1 to 192.168.66.254 port 67
Internet Systems Consortium DHCP Client V3.1.1
Copyright 2004-2008 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
 
Listening on LPF/eth1/00:11:25:c4:d9:c5
Sending on  LPF/eth1/00:11:25:c4:d9:c5
Sending on  Socket/fallback
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 14
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 4
DHCPOFFER from 192.168.66.19
DHCPREQUEST on eth1 to 255.255.255.255 port 67
DHCPACK from 192.168.66.19
bound to 192.168.66.2 -- renewal in 41122 seconds.
done.
</pre>
 
on the dhcp server, check the logs:
 
<pre class="brush: bash">
azur-25-kavlan-7:~# tail /var/log/syslog
Mar 17 16:22:51 azur-25 dhcpd: Copyright 2004-2008 Internet Systems Consortium.
Mar 17 16:22:51 azur-25 dhcpd: All rights reserved.
Mar 17 16:22:51 azur-25 dhcpd: For info, please visit http://www.isc.org/sw/dhcp/
Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 deleted host decls to leases file.
Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 new dynamic host decls to leases file.
Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 leases to leases file.
Mar 17 16:25:27 azur-25 dhcpd: DHCPDISCOVER from 00:11:25:c4:d9:c5 via eth1
Mar 17 16:25:27 azur-25 dhcpd: DHCPOFFER on 192.168.66.2 to 00:11:25:c4:d9:c5 via eth1
Mar 17 16:25:27 azur-25 dhcpd: DHCPREQUEST for 192.168.66.2 (192.168.66.19) from 00:11:25:c4:d9:c5 via eth1
Mar 17 16:25:27 azur-25 dhcpd: DHCPACK on 192.168.66.2 to 00:11:25:c4:d9:c5 via eth1
</pre>
 
In the four last lines, you see that your own dhcp server has given an address to the other node.
 
== DHCP + PXE ==
 
If you need to do PXE boot, you must change the tftp server in the dhcpd configuration file:
{{Term|location=node|cmd=IP=`hostname -i`}}
{{Term|location=node|cmd=<code class='command'>perl</code> -i -pe "s/next-server .*/next-server $IP;/" /etc/dhcp3/dhcpd.conf}}
(if there is no next-server configured, you must edit the file by hand and add you a line like this:
next-server XX.XX.XX.XX ;
 
where XX.XX.XX.XX is the IP of your node (echo $IP).

Revision as of 16:14, 11 April 2011


Overview

The goal of Kavlan is to allow people to manage VLAN on Grid'5000 nodes. The benefits is complete level 2 isolation. It can be used together with OAR and Kadeploy to do some experimentations on the grid.

Installation status on sites :

Sites Version Status
Bordeaux Fail.png
Grenoble Fail.png
Lille 1.1.6 Check.png
Lyon 1.1.6 Check.png
Nancy InProgress.png
Orsay Fail.png
Rennes InProgress.png
Sophia 1.1.6 Check.png
Toulouse Fail.png

There are 3 types of vlan available:

Isolated VLAN

An isolated VLAN (also called local VLAN) is completely isolated from the rest of Grid5000: no routing configured. Therefore, you must use a special host to reach your nodes inside this kind of VLAN.

The following figure shows two jobs running with KaVLAN: each job has it's nodes isolated in a local VLAN (purple and green). The other nodes are all in the default VLAN (red). The only way to reach the isolated nodes is to use a gateway node (kavlan-1 and kavlan-2 in the figure). The gateway has two Ethernet interfaces: one in the default VLAN and one is the dedicated VLAN. This way, you can use ssh to reach your nodes (an other way to reach an isolated node is to use the kaconsole command).

KaVLAN architecture: 2 jobs running KaVLAN
Note.png Note

The gateways are NOT doing any routing: they are only used as ssh gateways.

Routed VLAN

This kind of VLAN is not isolated at the layer 3. Therefore you can reach the nodes inside the VLAN from the rest of Grid5000.

Global VLAN

A global VLAN is a VLAN which is spread on all grid5000 sites (using QinQ). Therefore you can configure nodes of different sites on the same VLAN ! There will be no routing between these nodes. To reach nodes inside a VLAN, routing is configured on one of the router (on the site where you reserved this global VLAN)

Usage

How to reserve a VLAN

KaVLAN only works with deploy reservations; to obtain nodes and a VLAN, you must reserve a kavlan resources with oarsub. There are 3 kinds of resources: kavlan, kavlan-local, kavlan-global. For example, if you need 3 nodes and a local VLAN:

Terminal.png frontend:
oarsub -t deploy -l {"type='kavlan-local'"}/vlan=1+/nodes=3 -I

Then you can get the id of your VLAN using the kavlan command

Terminal.png frontend:
kavlan -V

If you run this command outside the shell started by OAR for your reservation, you must add the oar JOBID.

Terminal.png frontend:
kavlan -V -j JOBID

You should get an integer in the <1-3> range for local VLAN ( the range for routed vlan is <4-9>, and there is one global VLAN per OAR server).

You can get all the options of the command using --help:

# kavlan --help
Usage: kavlan [options]
Specific options:
    -i, --vlan-id N                  set VLAN ID (integer or DEFAULT)
    -C, --ca-cert CA                 CA certificate
    -c, --client-cert CERT           client certificate
    -k, --client-key KEY             client key
    -l, --get-nodelist               Show nodenames in the given vlan
    -e, --enable-dhcp                Start DHCP server
    -d, --disable-dhcp               Stop DHCP server
    -V, --show-vlan-id               Show vlan id of job (needs -j JOBID)
    -g, --get-vlan                   Show vlan of nodes
    -s, --set-vlan                   Set vlan of nodes
    -j, --oar-jobid JOBID            OAR job id
    -m, --machine NODE               set nodename (several -m are OK)
    -f, --filename NODEFILE          read nodes from a file
    -u, --user USERNAME              username
    -v, --[no-]verbose               Run verbosely
    -q, --[no-]quiet                 Run quietly
        --[no-]debug                 Run with debug output
    -h, --help                       Show this message
        --version                    Show version

Once you have a kavlan reservation running, you can put your nodes in your VLAN (and back into the default VLAN) at anytime during the lifetime of your job; for local VLAN, you are also allowed to connect to the VLAN gateway named kavlan-<ID> where ID is your vlan ID,


In you want to learn how to use KaVLAN, you can try the tutorial on Network isolation on Grid'5000