KaVLAN: Difference between revisions

From Grid5000
Jump to navigation Jump to search
 
(156 intermediate revisions by 14 users not shown)
Line 1: Line 1:
{{Maintainer|Nicolas Niclausse}}
{{Maintainer|Nicolas Niclausse}}
{{Maintainer|Pierre Neyron}}
{{Portal|User}}
{{Portal|User}}
{{Portal|Tutorial}}
{{Portal|Tutorial}}
{{Portal|Network}}
{{Portal|Network}}
{{Status|Draft}}
{{Status|In production}}
{{Pages|KaVLAN}}
{{TutorialHeader}}
__FORCETOC__
__FORCETOC__


= Overview =
= Overview =
The goal of Kavlan is to allow people to manage VLAN on Grid'5000 nodes. The benefits is  complete level 2 isolation. It can be used together with OAR and Kadeploy to do some experimentations on the grid.
[[Image:fig-kavlan.png|thumb|340px|alt="KaVLAN scheme"|KaVLAN big picture]]


Installation status on sites :
[[KaVLAN]] provides ''network isolation capabilities'' for Grid'5000 users' experimentations, via a high-level, user-driven interface to '''[https://en.wikipedia.org/wiki/Virtual_LAN VLANs (802.1Q)]'''.


{| class="checks" style="width: auto;"
Said differently: [[KaVLAN]] allows users to manage VLANs for the network connection of their Grid'5000 nodes.  
! class="left" |Sites
! Version
! Status
|-
| class="left" |[[Bordeaux:Home|Bordeaux]]
|
| [[Image:Fail.png]]
|-
| class="left" |[[Grenoble:Home|Grenoble]]
|
| [[Image:Fail.png]]
|-
| class="left" |[[Lille:Home|Lille]]
| 1.1.4
| [[Image:Check.png]]
|-
| class="left" |[[Lyon:Home|Lyon]]
| 1.1.4
| [[Image:Check.png]]
|-
| class="left" |[[Nancy:Home|Nancy]]
|
| [[Image:InProgress.png]]
|-
| class="left" |[[Orsay:Home|Orsay]]
|
| [[Image:Fail.png]]
|-
| class="left" |[[Rennes:Home|Rennes]]
|
| [[Image:Fail.png]]
|-
| class="left" |[[Sophia:Home|Sophia]]
| 1.1.4
| [[Image:Check.png]]
|-
| class="left" |[[Toulouse:Home|Toulouse]]
|
| [[Image:Fail.png]]
|}


{{Note|text=A new version of kavlan is currently being installed on Grid5000 This new version adds two kinds of VLAN (routed and global). This Tutorial is currently rewritten to document the new features}}
Behind the scenes, [[KaVLAN]] actually changes the configuration of the network switches of Grid'5000 infrastructure, to set the VLAN membership (VLAN ID) for the ports which are cabled to the network interfaces of one or more nodes.


There are 3 types of vlan available:
'''The benefit is a complete level 2 isolation for users' experiments.'''
== Isolated VLAN ==


An isolated VLAN (also called '''local VLAN''') is completely isolated from the rest of Grid5000: no routing configured. Therefore, you must used a special host to reach your nodes inside this kind of VLAN.
''It is however important to note that KaVLAN does not guarantee performance isolation: on sites with a hierarchical network (such as [[Nancy:Network|Nancy]]), inter-switch links may indeed be shared between various VLANs/experiments.''


The following figure shows two jobs running with KaVLAN: each job has it's nodes isolated in a local VLAN (purple and green). The other nodes are all in the default VLAN (red). The only way to reach the isolated nodes is to use a gateway node (kavlan-1 and kavlan-2 in the figure). The ''gateway'' has two Ethernet interfaces: one in the default VLAN and one is the dedicated VLAN. This way, you can use ssh to reach your nodes (an other way to reach an isolated node is to use the <code class='command'>kaconsole</code> command).
For experimentations involving network reconfiguration, [[KaVLAN]] is to be used together with OAR and Kadeploy (for the resources reservation and to gain control over the operating system and network configuration of the nodes)
[[Image:kavlan.png|450px|center|thumbnail|KaVLAN architecture: 2 jobs running KaVLAN]]


{{Note|text=The gateways are NOT doing any routing: they are only used as ssh gateways.}}
= The 3 KaVLAN VLAN types =
3 types of VLANs are available for users in Grid'5000: '''local''', '''routed''' and '''global'''.
{{Kvlan-types-and-id}}


== Routed VLAN ==
See the 2 schemas on the right of this page, which illustrate [[KaVLAN]] big picture and architecture.


This kind of VLAN is not isolated at the layer 3. Therefore you can reach the nodes inside the VLAN from the rest of Grid5000.
== 1: Local VLAN ==


== Global VLAN ==
From the IP routing point of view, a ''local VLAN'' is completely '''isolated''' from the rest of Grid'5000. '''No IP routing is configured in any router of the infrastructure'''. Therefore, to reach your nodes inside that kind of VLAN, the Grid'5000 infrastructure provides a special host you can hop by: '''the SSH gateway of the VLAN'''. For each local VLAN, the hostname of that SSH gateway is: ''kavlan-<code class="replace">ID</code>''.


A global VLAN is a VLAN which is spread on all grid5000 sites (using QinQ). Therefore you can configure nodes of different sites on the same VLAN ! There will be no routing between these nodes. To reach nodes inside a VLAN, routing is configured on one of the router (on the site where you reserved this global  VLAN)


=Usage=
Then you can connect to any of your nodes within the VLAN using hostnames such as ''<code class="replace">hostname-X</code>-kavlan-<code class="replace">ID</code>'' (adding the suffix ''-kavlan-'' + the ''VLAN_ID'' to the regular hostname), for instance from the SSH gateway of the VLAN, or from node to node (with the default provided DNS configuration in the VLAN).
== How to reserve a VLAN ==


KaVLAN only works with ''deploy'' reservations; to obtain nodes and a VLAN, you must reserve a kavlan resources with <code class="command">oarsub</code>. There are 3 kinds of resources: '''kavlan''', '''kavlan-local''', '''kavlan-global'''. For example, if you need 3 nodes and a local VLAN:
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -t deploy -l {"type='kavlan-local'"}/vlan=1+/nodes=3 -I}}


Then you can get the id of your VLAN using the <code class="command">kavlan</code> command
The figure below shows two jobs using KaVLAN: each job has its nodes isolated in a ''local VLAN'' ('''<font color="green">green</font>''' and '''<font color="purple">purple</font>'''). The other nodes are all in the default VLAN ('''<font color="red">red</font>'''). The only way to reach the isolated nodes is to hop by the VLAN's ''SSH gateway'' machine (kavlan-1 and kavlan-2 in the figure). Technically speaking, the ''SSH gateway'' has two Ethernet interfaces: one in the default VLAN and one in the dedicated VLAN. Another way to reach an isolated node is to use the <code class='command'>kaconsole</code> command.
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -V}}
[[Image:kavlan.png|600px|center|thumbnail|KaVLAN architecture: 2 jobs running KaVLAN]]


If you run this command outside the shell started by OAR for your reservation, you must add the oar JOBID.
{{Note|text=Please note that:
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -V -j <code class="replace">JOBID</code>}}
* as your nodes are isolated from the rest of Grid'5000, NFS mounts of /home partition is not possible. Therefore, '''Grid'5000 environments that mount /home partition (-nfs, -big, -std) may fail to boot'''}}


You should get an integer in the <1-3> range for local VLAN ( the range for routed vlan is <4-9>, and there is one global VLAN per OAR server).
== 2: Routed VLAN ==


You can get all the options of the command using --help:
Unlike ''local VLANs'' which are isolated, '''''routed VLANs'' are not isolated at the layer 3: IP packets are routed'''. Therefore you can reach the nodes inside a ''routed VLAN'' from the rest of Grid5000 (e.g. from the default VLAN, or from another ''routed VLAN''). No need here for a hop by a SSH gateway, as it is the case for ''local VLANs''.
<pre class="brush: bash">
# kavlan --help
Usage: kavlan [options]
Specific options:
    -i, --vlan-id N                  set VLAN ID (integer or DEFAULT)
    -C, --ca-cert CA                CA certificate
    -c, --client-cert CERT          client certificate
    -k, --client-key KEY            client key
    -l, --get-nodelist              Show nodenames in the given vlan
    -e, --enable-dhcp                Start DHCP server
    -d, --disable-dhcp              Stop DHCP server
    -V, --show-vlan-id              Show vlan id of job (needs -j JOBID)
    -g, --get-vlan                  Show vlan of nodes
    -s, --set-vlan                  Set vlan of nodes
    -j, --oar-jobid JOBID            OAR job id
    -m, --machine NODE              set nodename (several -m are OK)
    -f, --filename NODEFILE          read nodes from a file
    -u, --user USERNAME              username
    -v, --[no-]verbose              Run verbosely
    -q, --[no-]quiet                Run quietly
        --[no-]debug                Run with debug output
    -h, --help                      Show this message
        --version                    Show version
</pre>


Once you have a kavlan reservation running, you can put your nodes in your VLAN (and back into the default VLAN) at anytime during the lifetime of your job; for local VLAN, you are also allowed to connect to the VLAN gateway named <code class='hostname'>kavlan-<ID></code> where ID is your vlan ID,
Nodes in the VLAN are reachable with the following hostname: ''<code class="replace">hostname-X</code>-kavlan-<code class="replace">ID</code>'' (same naming scheme as for ''local VLANs''), from the frontends of the sites for instance.


Since KaVLAN works only with deploy jobs, the next step is to deploy at least one node (otherwise, you won't have root access on it and therefore can't restart it's network configuration).
== 3: Global VLAN ==


'''''Global VLANs'' are VLANs which spread on all grid5000 sites'''. Therefore you can configure nodes of different sites in the same ''global VLAN'', i.e. in a same Ethernet network (no inter-site IP routing required, nodes in a global VLAN use a same broadcast domain).


== Enable/disable the dhcp server of the gateway ==
(underneath they use the [https://en.wikipedia.org/wiki/IEEE_802.1ad IEEE 802.1ad] encapsulation, also known as QinQ to provide a same layer 2 network for all sites.)  
Once you have reserved a VLAN, you are allowed to start/stop the DHCP server of the VLAN.
Do this with the kavlan command (add ''-j JOBID'' if needed) :
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -e}}


(You can disable the DHCP server with <code class='command'>kavlan -d</code>)
There is exactly 1 and only 1 ''global VLAN'' provided by site. If that VLAN is already reserved by another user, you can try to get one from another site. '''Reservation must be made on the site of the ''global VLAN'''.


Since it is a same layer 2 network, no routing between the nodes which are placed in a ''global VLAN'' is required (even from site to site).


To reach nodes inside a ''global VLAN'' from outside, routing is configured on the router of the site where the ''global VLAN'' is reserved.
The hostnames of nodes within a VLAN follow the same scheme as above: ''<code class="replace">hostname-X</code>-kavlan-<code class="replace">ID</code>''.


== Deploy nodes and change VLAN in one step ==
{{Note|text=Please mind that there is not performance isolation between all ''global VLANs'' and also Grid'5000 inter-site VLAN (backbone VLAN). All share the same inter-site ''physical'' link}}


Kadeploy is now able to change the VLAN of the nodes it deploys:
{{Note|text=Inside Grid'5000's vlan API, the vlan type kavlan-global-remote can also be found, this is the same type as kavlan-global. This difference is due to kavlan's internal machinery.|inline=1}}
Let's say you want to deploy all nodes using the lenny-x64-base environment, in your VLAN (id=4):
{{Term|location=frontend|cmd=<code class="command">kadeploy3</code> -f $OAR_NODEFILE -k -e <code class="replace">lenny-x64-base</code> --vlan <code class="replace">4</code> }}


[[Image:Kavlan_admin.png|800px|center|thumb|KaVLAN architecture: see ''local VLANs'' in '''<font color="green">green</font>''', ''routed VLANs'' in '''<font color="blue">blue</font>''', ''global VLANs'' in '''<font color="purple">purple</font>''' and the default VLAN in '''<font color="red">red</font>''']]




== Change the VLAN of your nodes manually==
= Reserving a VLAN =


If you really want to change the VLAN manually, you can, but it's much simpler to change the vlan with kadeploy.
Using KaVLAN requires using '''deploy reservations for the nodes''', because it necessarily involves reconfiguring the network stack of the operating system of the nodes.


In order to change the VLAN of the nodes manually, you must reconfigure the network after the vlan has changed; but once the VLAN has changed, you can't connect to the node! An easy way to do this is to use the 'at' command (<code class='command'>apt-get install at</code> if it's not installed in your nodes)
To obtain both nodes and a VLAN, you must reserve kavlan resources (VLAN-IDs) with OAR using the <code class="command">oarsub</code> command.  OAR manages the 3 kinds of VLAN resources, shown in the table above:
* '''kavlan'''
* '''kavlan-local'''
* '''kavlan-global'''  


We will use [[Using_TakTuk|Taktuk]] to start remote commands on several nodes at once. In this example, we will use all the nodes. Since taktuk does not handle duplicate names in the nodefile, we must first remove duplicates.
For example, if you need 3 nodes and a local VLAN, you can run:
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -t deploy -l {"type='kavlan-local'"}/vlan=1+/nodes=3 -I}}


First, we will use taktuk to install <code class='command'>at</code> on all nodes, then the taktuk command will simply launch the network reconfiguration in one minute. Finally, we set the VLAN of all our nodes.
Then you can get the ID of your VLAN using the <code class="command">kavlan</code> command
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -V}}


<pre class="brush: bash">
If you need to run that command from outside the shell which is started by OAR for your reservation, you have to give the OAR ''JOBID''.
$ uniq $OAR_NODEFILE > ./mynodes
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -V -j <code class="replace">JOBID</code>}}
$ taktuk -s -l root -f ./mynodes broadcast exec [ "apt-get update; apt-get --yes install at" ]
$ taktuk -s -l root -f ./mynodes broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ]
$ kavlan -s
Take node list from OAR nodefile: /var/lib/oar/387465
... node azur-25.sophia.grid5000.fr changed to vlan KAVLAN-7
... node azur-28.sophia.grid5000.fr changed to vlan KAVLAN-7
... node azur-30.sophia.grid5000.fr changed to vlan KAVLAN-7
all nodes are configured in the vlan 7
</pre>
 
In one minute, your nodes will renegotiate their IP addresses and will be available inside the VLAN. To get the name of your nodes in the VLAN, use the ''-l'' option:
<pre class="brush: bash">
$kavlan  -l
azur-25-kavlan-7.sophia.grid5000.fr
azur-28-kavlan-7.sophia.grid5000.fr
azur-30-kavlan-7.sophia.grid5000.fr
</pre>
 
You can connect to each of them using kaconsole or ssh (first, you must connect to the gateway of the vlan):
<pre class="brush: bash">
$VLANID=`kavlan -V`
$ssh kavlan-$VLANID
kavlan-7@sophia$ ssh root@azur-25-kavlan-7
</pre>
 
You can use the <code class='command'>ip neigh</code> command to see the known hosts in your LAN; you should only see IPs in the 192.168.66.0/24 subnet
<pre class="brush: bash">
azur-25-kavlan-7:~$ip neigh
192.168.66.250 dev eth0  INCOMPLETE
192.168.66.254 dev eth0 lladdr 02:00:00:00:01:02 REACHABLE
</pre>
 
You should be able to ping another of your host inside your VLAN
<pre class="brush: bash">
azur-25-kavlan-7:~# ping -c 3 azur-30-kavlan-7
PING azur-30-kavlan-7.sophia.grid5000.fr (192.168.66.30) 56(84) bytes of data.
64 bytes from azur-30.local (192.168.66.30): icmp_seq=1 ttl=64 time=0.154 ms
64 bytes from azur-30.local (192.168.66.30): icmp_seq=2 ttl=64 time=0.170 ms
64 bytes from azur-30.local (192.168.66.30): icmp_seq=3 ttl=64 time=0.163 ms
 
--- azur-30-kavlan-7.sophia.grid5000.fr ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.154/0.162/0.170/0.012 ms
</pre>
 
You can configure ssh to make the connection through the gateway transparent:


== Configure ssh to easily connect to nodes in a local VLAN ==


In order to transparently use ssh to acces to isolated nodes (local VLAN), you should add this to your .ssh/config file on the frontend:
Either ways, you should get a '''VLAN ID''' integer in the :
* '''[1-3] range for local VLANs'''
* '''[4-9] range for routed VLANs'''
* and ID '''≥ 10 for global VLANs'''
{{Note|text=Reminder: only one global VLAN ID is available per site. On a site, it's only possible to reserve reserve the global vlan associated to the site. But after reservation, all global vlan can be used on all sites.}}


<pre class="brush: bash;">
= IP addresses associated with VLANs =
Host *-*-kavlan-1 *-*-kavlan-1.*.grid5000.fr
See below the KaVLAN ID, and associated IP subnets ('''served by DHCP in the VLANs''')
    ProxyCommand ssh -q -a -x kavlan-1 nc %h %p
{{Template:KaVLAN IP Network Golden rules}}
Host *-*-kavlan-2 *-*-kavlan-2.*.grid5000.fr
(More info in the [[Grid5000:Network|Network page]])
    ProxyCommand ssh -q -a -x kavlan-2 nc %h %p
Host *-*-kavlan-3 *-*-kavlan-3.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-3 nc %h %p
</pre>


Then you can simply use ssh <cluster>-<nodeid>-kavlan-<vlanid> to access the node , for ex:
;A DHCP service is provided in all VLANs (local, routed, and global):
{{Term|location=frontend|cmd=<code class="command">ssh</code> root@<code class='replace'>NODE</code>-kavlan-<code class='replace'>VLANID</code>}}
Once the network interfaces of some nodes will be put in a VLAN, to get relevant IPs within the VLAN, they can be brought ''down-up'', or the networking service of the operating system can be restarted, or the node can be rebooted...


== Put your nodes back into the default VLAN ==
{{Note|text=When changing the network configuration, using <code class=command>kaconsole3</code> may prove very handy. Also using the <code class=command>at</code> command may be used to run a command asynchronously and overcome issues with the network disconnection, e.g. <pre>... | at now + 1 minute</pre>}}


First, get put the list of your nodes name with vlan in a file:
If needed for your experiment, please note that the '''<code class="command">kavlan</code>''' command allows to '''deactivate the DHCP service''' in a VLAN (see below).
{{Term|location=frontend|cmd=<code class='command'>kavlan</code> -l > mynodes-vlan}}


Don't forget to first start the network restarting command with taktuk:
Reminder: for local VLANs, you are also allowed to ssh to the VLAN's SSH gateway, which is named kavlan-<code class='replace'>ID</code>.
{{Term|location=frontend|cmd=<code class='command'>taktuk</code> -s -l root -f ./mynodes-vlan broadcast exec [ "echo '/etc/init.d/networking restart' &#124;  at now + 1 minute " ]}}


Then you can put your nodes back in the default VLAN:
= Setting up the VLAN =
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -s -i DEFAULT -f $OAR_NODEFILE}}
Configuring the VLANs is done with the  '''<code class="command">kavlan</code>''' command.


You should be able to ping your nodes:
All the options of the command can be show using ''--help'', as follows:
<pre class="brush: bash">
<pre class="brush: bash">
for i in `uniq $OAR_NODEFILE`; do ping -c 1 $i; done
# kavlan --help
PING azur-25.sophia.grid5000.fr (138.96.20.25 56(84) bytes of data.
Usage: kavlan [options]
64 bytes from azur-25.sophia.grid5000.fr (138.96.20.25): icmp_seq=1 ttl=64 time=1002 ms
Specific options:
 
    -i, --vlan-id N                  set VLAN ID (integer or DEFAULT)
--- azur-25.sophia.grid5000.fr ping statistics ---
    -C, --ca-cert CA                CA certificate
1 packets transmitted, 1 received, 0% packet loss, time 0ms
    -c, --client-cert CERT          client certificate
rtt min/avg/max/mdev = 1002.910/1002.910/1002.910/0.000 ms
    -k, --client-key KEY            client key
PING azur-28.sophia.grid5000.fr (138.96.20.28) 56(84) bytes of data.
    -l, --get-nodelist              Show nodenames in the given vlan
64 bytes from azur-28.sophia.grid5000.fr (138.96.20.28): icmp_seq=1 ttl=64 time=1.23 ms
    -e, --enable-dhcp                Start DHCP server
 
    -d, --disable-dhcp              Stop DHCP server
--- azur-28.sophia.grid5000.fr ping statistics ---
    -V, --show-vlan-id              Show vlan id of job (needs -j JOBID)
1 packets transmitted, 1 received, 0% packet loss, time 0ms
    -g, --get-vlan                  Show vlan of nodes
rtt min/avg/max/mdev = 1.234/1.234/1.234/0.000 ms
    -s, --set-vlan                  Set vlan of nodes
PING azur-30.sophia.grid5000.fr (138.96.20.30) 56(84) bytes of data.
    -j, --oar-jobid JOBID            OAR job id
64 bytes from azur-30.sophia.grid5000.fr (138.96.20.30): icmp_seq=1 ttl=64 time=1.25 ms
    -m, --machine NODE              set nodename (several -m are OK)
 
    -f, --filename NODEFILE          read nodes from a file
--- azur-30.sophia.grid5000.fr ping statistics ---
    -u, --user USERNAME              username
1 packets transmitted, 1 received, 0% packet loss, time 0ms
    -v, --[no-]verbose              Run verbosely
rtt min/avg/max/mdev = 1.259/1.259/1.259/0.000 ms
    -q, --[no-]quiet                Run quietly
        --[no-]debug                Run with debug output
    -h, --help                      Show this message
        --version                    Show version
</pre>
</pre>


Another way to put back nodes into the default VLAN is to change the vlan and then kareboot the nodes.
So, once you have a ''kavlan'' job running, and know your vlan ID, you can use the '''<code class="command">kavlan</code>''' command to put some network interfaces of your nodes in your VLAN (and later, back into the default VLAN) at anytime during the lifetime of your job.  


{{Term|location=frontend|cmd=<code class='command'>kavlan</code> -s -i DEFAULT -f $OAR_NODEFILE}}
In case of a node with multiple cabled network interfaces, each of them can be used, with the following naming:
{{Term|location=frontend|cmd=<code class='command'>kareboot</code> -f $OAR_NODEFILE -r simple_reboot}}
* for the default interface: <code class="replace">hostname-X</code>-kavlan-<code class="replace">ID</code>
 
* for other interfaces: <code class="replace">hostname-X</code>-eth<code class="replace">Y</code>-kavlan-<code class="replace">ID</code>
=Using the API=
 
Kavlan is also available through the API.
 
FIXME
=Advance usage=
== Setup a DHCP server on your nodes ==
 
If you need to run your own DHCP server (for example if you want to run a cluster distribution inside kavlan or test kadeploy ), you can use the configuration file available on the VLAN's gateway.
 
Let's say that you want to install dhcpd on azur-25-kavlan-7. You first have to install a dhcp server on this node (we assume the node is not yet is the job VLAN):
{{Term|location=node|cmd=<code class="command">apt-get</code> install dhcp3-server}}
 
Then, download a script that will generate your dhcp configuration:
 
{{Term|location=frontend|cmd=<code class="command">wget</code> http://public.sophia.grid5000.fr/~nniclausse/gen_dhcpd_conf.rb}}
 
(this script use <code>restfully</code> and <code>ruby-ib</code> gems)
 
 
Then, generate the configuration (replace <SITE> and <VLANID> by your current site and vlan id (7 in the example) ) and copy it on the node:
<pre class="brush: bash">
chmod +x ./gen_dhcpd_conf.rb
./gen_dhcpd_conf.rb --site <SITE> --vlan-id <VLANID>
scp dhcpd-kavlan-<VLANID>-<SITE>.conf root@node:/etc/dhcp3/dhcpd.conf}}
</pre>
 
 
Then we must isolate our nodes before starting the dhcp server:
<pre class="brush: bash">
frontend$ taktuk -s -l root -f ./mynodes broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ]
frontend$ kavlan -s
</pre>
Wait one minute, and then you can start the server, once you have disabled the gateway's DHCP server.
On the frontend {{Term|location=frontend|cmd=<code class="command">kavlan -d</code>}}
then on the node choose as a DHCP server:
{{Term|location=node|cmd=<code class='command'>/etc/init.d/dhcp3-server</code> start}}
 
Then, in another shell, connect as root on a second node:
{{Term|location=frontend|cmd=<code class='command'>ssh</code> root@azur-30-kavlan-7}}
 
And restart the network configuration:
<pre class="brush: bash">
azur-30-kavlan-7:~# /etc/init.d/networking restart
Reconfiguring network interfaces...There is already a pid file /var/run/dhclient.eth1.pid with pid 5319
killed old client process, removed PID file
Internet Systems Consortium DHCP Client V3.1.1
Copyright 2004-2008 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
 
Listening on LPF/eth1/00:11:25:c4:d9:c5
Sending on  LPF/eth1/00:11:25:c4:d9:c5
Sending on  Socket/fallback
DHCPRELEASE on eth1 to 192.168.66.254 port 67
Internet Systems Consortium DHCP Client V3.1.1
Copyright 2004-2008 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
 
Listening on LPF/eth1/00:11:25:c4:d9:c5
Sending on  LPF/eth1/00:11:25:c4:d9:c5
Sending on  Socket/fallback
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 14
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 4
DHCPOFFER from 192.168.66.19
DHCPREQUEST on eth1 to 255.255.255.255 port 67
DHCPACK from 192.168.66.19
bound to 192.168.66.2 -- renewal in 41122 seconds.
done.
</pre>
 
on the dhcp server, check the logs:
 
<pre class="brush: bash">
azur-25-kavlan-7:~# tail /var/log/syslog
Mar 17 16:22:51 azur-25 dhcpd: Copyright 2004-2008 Internet Systems Consortium.
Mar 17 16:22:51 azur-25 dhcpd: All rights reserved.
Mar 17 16:22:51 azur-25 dhcpd: For info, please visit http://www.isc.org/sw/dhcp/
Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 deleted host decls to leases file.
Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 new dynamic host decls to leases file.
Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 leases to leases file.
Mar 17 16:25:27 azur-25 dhcpd: DHCPDISCOVER from 00:11:25:c4:d9:c5 via eth1
Mar 17 16:25:27 azur-25 dhcpd: DHCPOFFER on 192.168.66.2 to 00:11:25:c4:d9:c5 via eth1
Mar 17 16:25:27 azur-25 dhcpd: DHCPREQUEST for 192.168.66.2 (192.168.66.19) from 00:11:25:c4:d9:c5 via eth1
Mar 17 16:25:27 azur-25 dhcpd: DHCPACK on 192.168.66.2 to 00:11:25:c4:d9:c5 via eth1
</pre>


In the four last lines, you see that your own dhcp server has given an address to the other node.
{{Note|text=You may notice that the hostname for secondaries interfaces is formed like this "<code class="replace">hostname-X</code>-eth<code class="replace">Y</code>-kavlan-<code class="replace">ID</code>" while the name of the interface is the system is "<code class="replace">enoY</code> or <code class="replace">enpYsZ</code>".
It's due to changes in the naming of interfaces since debian9 (see https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/).<br/>
Kavlan hostname still uses old interface names, if you're not sure which name correspond to which interface, both naming (old and new) are describes in the api.
For exemple, '''grisou-2-eth1-kavlan-1.nancy.grid5000.fr''' will correspond to interface '''eno2''' on '''grisou-2''' ( https://api.grid5000.fr/stable/sites/nancy/clusters/grisou/nodes/grisou-2.json )
}}


== DHCP + PXE ==


If you need to do PXE boot, you must change the tftp server in the dhcpd configuration file:
; Going Further
{{Term|location=node|cmd=IP=`hostname -i`}}
{{Term|location=node|cmd=<code class='command'>perl</code> -i -pe "s/next-server .*/next-server $IP;/" /etc/dhcp3/dhcpd.conf}}
(if there is no next-server configured, you must edit the file by hand and add you a line like this:
next-server XX.XX.XX.XX ;


where XX.XX.XX.XX is the IP of your node (echo $IP).
Please look at the other KaVLAN pages for examples of usage (look at the '''see-also dialog box''' at the top of the page).

Latest revision as of 17:44, 3 July 2025

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.


Overview

"KaVLAN scheme"
KaVLAN big picture

KaVLAN provides network isolation capabilities for Grid'5000 users' experimentations, via a high-level, user-driven interface to VLANs (802.1Q).

Said differently: KaVLAN allows users to manage VLANs for the network connection of their Grid'5000 nodes.

Behind the scenes, KaVLAN actually changes the configuration of the network switches of Grid'5000 infrastructure, to set the VLAN membership (VLAN ID) for the ports which are cabled to the network interfaces of one or more nodes.

The benefit is a complete level 2 isolation for users' experiments.

It is however important to note that KaVLAN does not guarantee performance isolation: on sites with a hierarchical network (such as Nancy), inter-switch links may indeed be shared between various VLANs/experiments.

For experimentations involving network reconfiguration, KaVLAN is to be used together with OAR and Kadeploy (for the resources reservation and to gain control over the operating system and network configuration of the nodes)

The 3 KaVLAN VLAN types

3 types of VLANs are available for users in Grid'5000: local, routed and global.

KaVLAN name in OAR type first id last id
kavlan-local local 1 3
kavlan routed 4 9
kavlan-global global 10 21

See the 2 schemas on the right of this page, which illustrate KaVLAN big picture and architecture.

1: Local VLAN

From the IP routing point of view, a local VLAN is completely isolated from the rest of Grid'5000. No IP routing is configured in any router of the infrastructure. Therefore, to reach your nodes inside that kind of VLAN, the Grid'5000 infrastructure provides a special host you can hop by: the SSH gateway of the VLAN. For each local VLAN, the hostname of that SSH gateway is: kavlan-ID.


Then you can connect to any of your nodes within the VLAN using hostnames such as hostname-X-kavlan-ID (adding the suffix -kavlan- + the VLAN_ID to the regular hostname), for instance from the SSH gateway of the VLAN, or from node to node (with the default provided DNS configuration in the VLAN).


The figure below shows two jobs using KaVLAN: each job has its nodes isolated in a local VLAN (green and purple). The other nodes are all in the default VLAN (red). The only way to reach the isolated nodes is to hop by the VLAN's SSH gateway machine (kavlan-1 and kavlan-2 in the figure). Technically speaking, the SSH gateway has two Ethernet interfaces: one in the default VLAN and one in the dedicated VLAN. Another way to reach an isolated node is to use the kaconsole command.

KaVLAN architecture: 2 jobs running KaVLAN
Note.png Note

Please note that:

  • as your nodes are isolated from the rest of Grid'5000, NFS mounts of /home partition is not possible. Therefore, Grid'5000 environments that mount /home partition (-nfs, -big, -std) may fail to boot

2: Routed VLAN

Unlike local VLANs which are isolated, routed VLANs are not isolated at the layer 3: IP packets are routed. Therefore you can reach the nodes inside a routed VLAN from the rest of Grid5000 (e.g. from the default VLAN, or from another routed VLAN). No need here for a hop by a SSH gateway, as it is the case for local VLANs.

Nodes in the VLAN are reachable with the following hostname: hostname-X-kavlan-ID (same naming scheme as for local VLANs), from the frontends of the sites for instance.

3: Global VLAN

Global VLANs are VLANs which spread on all grid5000 sites. Therefore you can configure nodes of different sites in the same global VLAN, i.e. in a same Ethernet network (no inter-site IP routing required, nodes in a global VLAN use a same broadcast domain).

(underneath they use the IEEE 802.1ad encapsulation, also known as QinQ to provide a same layer 2 network for all sites.)

There is exactly 1 and only 1 global VLAN provided by site. If that VLAN is already reserved by another user, you can try to get one from another site. Reservation must be made on the site of the global VLAN.

Since it is a same layer 2 network, no routing between the nodes which are placed in a global VLAN is required (even from site to site).

To reach nodes inside a global VLAN from outside, routing is configured on the router of the site where the global VLAN is reserved. The hostnames of nodes within a VLAN follow the same scheme as above: hostname-X-kavlan-ID.

Note.png Note

Please mind that there is not performance isolation between all global VLANs and also Grid'5000 inter-site VLAN (backbone VLAN). All share the same inter-site physical link

Note.png Note

Inside Grid'5000's vlan API, the vlan type kavlan-global-remote can also be found, this is the same type as kavlan-global. This difference is due to kavlan's internal machinery.

KaVLAN architecture: see local VLANs in green, routed VLANs in blue, global VLANs in purple and the default VLAN in red


Reserving a VLAN

Using KaVLAN requires using deploy reservations for the nodes, because it necessarily involves reconfiguring the network stack of the operating system of the nodes.

To obtain both nodes and a VLAN, you must reserve kavlan resources (VLAN-IDs) with OAR using the oarsub command. OAR manages the 3 kinds of VLAN resources, shown in the table above:

  • kavlan
  • kavlan-local
  • kavlan-global

For example, if you need 3 nodes and a local VLAN, you can run:

Terminal.png frontend:
oarsub -t deploy -l {"type='kavlan-local'"}/vlan=1+/nodes=3 -I

Then you can get the ID of your VLAN using the kavlan command

Terminal.png frontend:
kavlan -V

If you need to run that command from outside the shell which is started by OAR for your reservation, you have to give the OAR JOBID.

Terminal.png frontend:
kavlan -V -j JOBID


Either ways, you should get a VLAN ID integer in the :

  • [1-3] range for local VLANs
  • [4-9] range for routed VLANs
  • and ID ≥ 10 for global VLANs
Note.png Note

Reminder: only one global VLAN ID is available per site. On a site, it's only possible to reserve reserve the global vlan associated to the site. But after reservation, all global vlan can be used on all sites.

IP addresses associated with VLANs

See below the KaVLAN ID, and associated IP subnets (served by DHCP in the VLANs)

Local VLANs (non-routed)
Site KAVLAN-1 KAVLAN-2 KAVLAN-3
All 192.168.192.0/20 192.168.208.0/20 192.168.224.0/20
Routed VLANs
Site KAVLAN-4 KAVLAN-5 KAVLAN-6 KAVLAN-7 KAVLAN-8 KAVLAN-9
Bordeaux 10.0.0.0/18 10.0.64.0/18 10.0.128.0/18 10.0.192.0/18 10.1.0.0/18 10.1.64.0/18
Grenoble 10.4.0.0/18 10.4.64.0/18 10.4.128.0/18 10.4.192.0/18 10.5.0.0/18 10.5.64.0/18
Lille 10.8.0.0/18 10.8.64.0/18 10.8.128.0/18 10.8.192.0/18 10.9.0.0/18 10.9.64.0/18
Lyon 10.12.0.0/18 10.12.64.0/18 10.12.128.0/18 10.12.192.0/18 10.13.0.0/18 10.13.64.0/18
Nancy 10.16.0.0/18 10.16.64.0/18 10.16.128.0/18 10.16.192.0/18 10.17.0.0/18 10.17.64.0/18
Orsay 10.20.0.0/18 10.20.64.0/18 10.20.128.0/18 10.20.192.0/18 10.21.0.0/18 10.21.64.0/18
Rennes 10.24.0.0/18 10.24.64.0/18 10.24.128.0/18 10.24.192.0/18 10.25.0.0/18 10.25.64.0/18
Toulouse 10.28.0.0/18 10.28.64.0/18 10.28.128.0/18 10.28.192.0/18 10.29.0.0/18 10.29.64.0/18
Sophia 10.32.0.0/18 10.32.64.0/18 10.32.128.0/18 10.32.192.0/18 10.33.0.0/18 10.33.64.0/18
Strasbourg 10.36.0.0/18 10.36.64.0/18 10.36.128.0/18 10.36.192.0/18 10.37.0.0/18 10.37.64.0/18
Luxembourg 10.40.0.0/18 10.40.64.0/18 10.40.128.0/18 10.40.192.0/18 10.41.0.0/18 10.41.64.0/18
Nantes 10.44.0.0/18 10.44.64.0/18 10.44.128.0/18 10.44.192.0/18 10.45.0.0/18 10.45.64.0/18
Louvain 10.48.0.0/18 10.48.64.0/18 10.48.128.0/18 10.48.192.0/18 10.49.0.0/18 10.49.64.0/18
Note.png Note

At the end of each network, address x.x.x.253 is used by Kavlan server

Global VLANs
Site Global Vlan Subnet Router IP
Bordeaux KAVLAN-10 10.3.192.0/18 10.3.255.254
Grenoble KAVLAN-11 10.7.192.0/18 10.7.255.254
Lille KAVLAN-12 10.11.192.0/18 10.11.255.254
Lyon KAVLAN-13 10.15.192.0/18 10.15.255.254
Nancy KAVLAN-14 10.19.192.0/18 10.19.255.254
Orsay KAVLAN-15 10.23.192.0/18 10.23.255.254
Rennes KAVLAN-16 10.27.192.0/18 10.27.255.254
Toulouse KAVLAN-17 10.31.192.0/18 10.31.255.254
Sophia KAVLAN-18 10.35.192.0/18 10.35.255.254
Strasbourg KAVLAN-19 10.39.192.0/18 10.39.255.254
Luxembourg KAVLAN-20 10.43.192.0/18 10.43.255.254
Nantes KAVLAN-21 10.47.192.0/18 10.47.255.254
Louvain KAVLAN-22 10.51.192.0/18 10.51.255.254
IP subnet assignments for the sites within a global VLANs

A global VLAN is a /18 subnet (16382 IP addresses). It is split so that every site gets one /23 (510 ip) in the global VLAN address space.

Example for the global VLAN of Lille, KAVLAN-12, whose address space is 10.11.192.0/18:

  • Bordeaux: 10.11.192.110.11.193.254
  • Grenoble: 10.11.194.110.11.195.254
  • Lille: 10.11.196.110.11.197.254
  • Lyon: 10.11.198.110.11.199.254
  • Nancy: 10.11.200.110.11.201.254
  • Orsay: 10.11.202.110.11.203.254
  • Rennes: 10.11.204.110.11.205.254
  • Toulouse: 10.11.206.110.11.207.254
  • Sophia: 10.11.208.110.11.209.254
  • Strasbourg: 10.11.210.110.11.211.254
  • Luxembourg: 10.11.212.110.11.213.254
  • Nantes: 10.11.214.110.11.215.254

(More info in the Network page)

A DHCP service is provided in all VLANs (local, routed, and global)

Once the network interfaces of some nodes will be put in a VLAN, to get relevant IPs within the VLAN, they can be brought down-up, or the networking service of the operating system can be restarted, or the node can be rebooted...

Note.png Note

When changing the network configuration, using kaconsole3 may prove very handy. Also using the at command may be used to run a command asynchronously and overcome issues with the network disconnection, e.g.

... | at now + 1 minute

If needed for your experiment, please note that the kavlan command allows to deactivate the DHCP service in a VLAN (see below).

Reminder: for local VLANs, you are also allowed to ssh to the VLAN's SSH gateway, which is named kavlan-ID.

Setting up the VLAN

Configuring the VLANs is done with the kavlan command.

All the options of the command can be show using --help, as follows:

# kavlan --help
Usage: kavlan [options]
Specific options:
    -i, --vlan-id N                  set VLAN ID (integer or DEFAULT)
    -C, --ca-cert CA                 CA certificate
    -c, --client-cert CERT           client certificate
    -k, --client-key KEY             client key
    -l, --get-nodelist               Show nodenames in the given vlan
    -e, --enable-dhcp                Start DHCP server
    -d, --disable-dhcp               Stop DHCP server
    -V, --show-vlan-id               Show vlan id of job (needs -j JOBID)
    -g, --get-vlan                   Show vlan of nodes
    -s, --set-vlan                   Set vlan of nodes
    -j, --oar-jobid JOBID            OAR job id
    -m, --machine NODE               set nodename (several -m are OK)
    -f, --filename NODEFILE          read nodes from a file
    -u, --user USERNAME              username
    -v, --[no-]verbose               Run verbosely
    -q, --[no-]quiet                 Run quietly
        --[no-]debug                 Run with debug output
    -h, --help                       Show this message
        --version                    Show version

So, once you have a kavlan job running, and know your vlan ID, you can use the kavlan command to put some network interfaces of your nodes in your VLAN (and later, back into the default VLAN) at anytime during the lifetime of your job.

In case of a node with multiple cabled network interfaces, each of them can be used, with the following naming:

  • for the default interface: hostname-X-kavlan-ID
  • for other interfaces: hostname-X-ethY-kavlan-ID
Note.png Note

You may notice that the hostname for secondaries interfaces is formed like this "hostname-X-ethY-kavlan-ID" while the name of the interface is the system is "enoY or enpYsZ".

It's due to changes in the naming of interfaces since debian9 (see https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/).
Kavlan hostname still uses old interface names, if you're not sure which name correspond to which interface, both naming (old and new) are describes in the api.

For exemple, grisou-2-eth1-kavlan-1.nancy.grid5000.fr will correspond to interface eno2 on grisou-2 ( https://api.grid5000.fr/stable/sites/nancy/clusters/grisou/nodes/grisou-2.json )


Going Further

Please look at the other KaVLAN pages for examples of usage (look at the see-also dialog box at the top of the page).