Advanced KaVLAN: Difference between revisions
| Line 313: | Line 313: | ||
| Now you have to install a DHCP server on the node (we assume the node is not yet in the job VLAN, or the vlan is routed and have access to internet for apt): | Now you have to install a DHCP server on the node (we assume the node is not yet in the job VLAN, or the vlan is routed and have access to internet for apt): | ||
| {{Term|location=node|cmd=<code class="command">apt-get</code> install isc-dhcp-server}} | {{Term|location=node|cmd=<code class="command">apt-get</code> install isc-dhcp-server}} | ||
| There may be an error after the installation : It's normal, you need to tell the DHCP server on which interface to listen to DHCP requests (replace " | There may be an error after the installation : It's normal, you need to tell the DHCP server on which interface to listen to DHCP requests (replace "eno1" with the name of the interface on which the server should listen) | ||
| {{Term|location=node|cmd=<code class="command">sed</code> -i s/INTERFACESv4=\"\"/INTERFACESv4=\" | {{Term|location=node|cmd=<code class="command">sed</code> -i s/INTERFACESv4=\"\"/INTERFACESv4=\"eno1\"/g /etc/default/isc-dhcp-server}} | ||
| You can now copy the generated configuration file and start the DHCP server : | You can now copy the generated configuration file and start the DHCP server : | ||
| Line 342: | Line 342: | ||
| {{Term|location=node-dhcp-server|cmd=<code class='command'>tail</code> /var/log/daemon.log}} | {{Term|location=node-dhcp-server|cmd=<code class='command'>tail</code> /var/log/daemon.log}} | ||
|   ... |   ... | ||
|   Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPDISCOVER from 00:26:b9:3f:43:a1 via  |   Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPDISCOVER from 00:26:b9:3f:43:a1 via eno1 | ||
|   Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPOFFER on 10.32.3.7 to 00:26:b9:3f:43:a1 via  |   Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPOFFER on 10.32.3.7 to 00:26:b9:3f:43:a1 via eno1 | ||
|   Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPREQUEST for 10.32.3.7 (10.32.3.6) from 00:26:b9:3f:43:a1 via  |   Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPREQUEST for 10.32.3.7 (10.32.3.6) from 00:26:b9:3f:43:a1 via eno1 | ||
|   Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPACK on 10.32.3.7 to 00:26:b9:3f:43:a1 via  |   Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPACK on 10.32.3.7 to 00:26:b9:3f:43:a1 via eno1 | ||
| In the four last lines, you see that your own dhcp server has given an address to the other node. | In the four last lines, you see that your own dhcp server has given an address to the other node. | ||
Revision as of 16:04, 21 January 2019
|   | Note | 
|---|---|
| This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team. | |
Overview
The goal of Kavlan is to provide network isolation for Grid'5000 users. KaVLAN allow users to manage VLAN on their Grid'5000 nodes. The benefits is complete level 2 isolation. It can be used together with OAR and Kadeploy to do some experimentations on the platform.
The first step is to read the KaVLAN introduction to understand what kind of VLANs you can configure.
If you want a more concrete example of what you can do with VLANs on Grid'5000, you can go through the Network_reconfiguration_tutorial.
Reserve VLANs and deploy nodes inside
In the first part of the tutorial, we will use kadeploy and kavlan together on a single site, with a routed vlan (we could also use a local vlan).
Once connected on a frontend, in order to obtain nodes and a VLAN you must reserve a kavlan resource with oarsub. There are 3 kinds of resources: kavlan, kavlan-local, kavlan-global. Here, we will use 3 nodes and a routed VLAN, let's say in Sophia on cluster Suno:
A shell is now opened on the frontend (like any regular deploy job)
You can get the id of your VLAN using the kavlan command:
If you run this command outside the shell started by OAR for your reservation, you must add the oar JOBID.
You should get an integer in the <4-9> range for this routed VLAN (the range for local vlan is <1-3>, and there is one global VLAN per OAR server, i.e. one per site).
For our example, let's say we got suno-2, suno-30 and suno-31, and kavlan #4.
You can get all the options of the command using --help:
# kavlan --help
Usage: kavlan [options]
Specific options:
    -i, --vlan-id N                  set VLAN ID (integer or DEFAULT)
    -C, --ca-cert CA                 CA certificate
    -c, --client-cert CERT           client certificate
    -k, --client-key KEY             client key
    -l, --get-nodelist               Show nodenames in the given vlan
    -e, --enable-dhcp                Start DHCP server
    -d, --disable-dhcp               Stop DHCP server
    -V, --show-vlan-id               Show vlan id of job (needs -j JOBID)
    -g, --get-vlan                   Show vlan of nodes
    -s, --set-vlan                   Set vlan of nodes
    -j, --oar-jobid JOBID            OAR job id
    -m, --machine NODE               set nodename (several -m are OK)
    -f, --filename NODEFILE          read nodes from a file
    -u, --user USERNAME              username
    -v, --[no-]verbose               Run verbosely
    -q, --[no-]quiet                 Run quietly
        --[no-]debug                 Run with debug output
    -h, --help                       Show this message
        --version                    Show version
Once you have a kavlan reservation running, you can put your nodes in your VLAN (and back into the default VLAN) at anytime during the lifetime of your job; we will not use this for now.
Instead we will change the VLAN with kadeploy directly. The next step is to deploy the nodes with an environment image, for instance debian9-x64-big.
Enable the dhcp server of the VLAN
Before deploying, if you don't install your own DHCP server, you should start the default DHCP server of the VLAN. Do this with the kavlan command (add -j JOBID if needed) :
(You can disable the DHCP server with kavlan -d)
Deploy nodes and change VLAN in one step
Once the deployment is done, you will be able to connect on your nodes. They are now inside the VLAN, therefore they are not reachable with their default IP:
PING suno-30.sophia.grid5000.fr (172.16.130.30) 56(84) bytes of data. From fsophia.sophia.grid5000.fr (172.16.143.106) icmp_seq=1 Destination Host Unreachable --- suno-30.sophia.grid5000.fr ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
You can get the list of new hostnames of your nodes in the VLAN with kavlan -l. For the next part of this tutorial, let's create a nodefile and copy it on the first node:
suno-2-kavlan-4.sophia.grid5000.fr suno-30-kavlan-4.sophia.grid5000.fr suno-31-kavlan-4.sophia.grid5000.fr
The password for user root on Grid'5000 environments is "grid5000".
You can see that you can ping these new hostnames:
PING suno-30-eth0-kavlan-4.sophia.grid5000.fr (10.32.3.30) 56(84) bytes of data. 64 bytes from suno-30-eth0-kavlan-4.sophia.grid5000.fr (10.32.3.30): icmp_seq=1 ttl=63 time=0.151 ms --- suno-30-eth0-kavlan-4.sophia.grid5000.fr ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.151/0.151/0.151/0.000 ms
Setup a DHCP server on your nodes
Configure DHCP
If you need to run your own DHCP server (for example if you want to run a cluster distribution inside kavlan or test kadeploy ), you can use a script to generate the configuration file:
Then, go back the the frontend, and download the script that will generate your dhcp configuration:
Create this file (gen_dhcpd_conf.rb) on the frontend :
#!/usr/bin/ruby # Author: Nicolas Niclausse # Copyright 2010-2011: INRIA # script specific to grid5000: # generate dhcpd config files for kavlan require 'rubygems' require 'restfully' # gem install restfully --source http://gemcutter.org require 'ip' # gem install ruby-ip require 'getoptlong' require 'optparse' require 'ostruct' headers = "ddns-update-style none; option space pxelinux; option pxelinux.magic code 208 = string; option pxelinux.configfile code 209 = text; option pxelinux.pathprefix code 210 = text; option pxelinux.reboottime code 211 = unsigned integer 32; option vendorinfo code 43 = string; " conf = File.expand_path('~/.restfully/api.grid5000.fr.yaml') options = if FileTest.exists?(conf) then YAML.load_file(conf) else {} end options[:base_uri] = 'https://api.grid5000.fr/stable/grid5000' def parseopts(args) options = OpenStruct.new options.debug = false options.verbose = false options.quiet = false options.nodes = [] opts = OptionParser.new do |opts| opts.banner = "Usage: gen_dhcpd_conf.rb [options]" opts.separator "" opts.separator "Specific options:" opts.on("-s","--site SITE", "generate only DHCP conf for site SITE") do |site| options.site = site end opts.on("-i","--vlan-id N", Integer , "generate only DHCP conf for vlan N") do |vlan| options.vlan = vlan end opts.on("-q", "--[no-]quiet", "Run quietly") do |q| options.quiet = q end opts.on("-v", "--[no-]verbose", "Run verbosely") do |v| options.verbose = v end opts.on_tail("-h", "--help", "Show this message") do puts opts exit end end opts.parse!(args) options end $opts = parseopts(ARGV) Restfully::Session.new(options) do |root, session| options = {:query => {:version => root['version']}} root.sites(options).each do |site| mysite=site['uid'] next if not $opts.site.nil? and mysite != $opts.site # optionaly, read mac address from external yaml file ref = if FileTest.exists?(mysite+".yaml") then YAML.load_file(mysite+".yaml") else puts mysite +": no yaml file for macs" unless $opts.quiet {} end if $opts.vlan.nil? then vlans = (1..9).to_a # try to guess global vlan assigned to current site (10..21).each do |gvlan| begin IPSocket::getaddress("gw-kavlan-"+gvlan.to_s+"."+mysite+".grid5000.fr") puts "global vlan found for site %s: " % mysite unless $opts.quiet vlans.push(gvlan) rescue next end end else vlans = [$opts.vlan] end vlans.each do |vlan| filename = "dhcpd-kavlan-"+vlan.to_s+"-"+mysite+".conf" open(filename, 'w') do |f| puts "generating "+filename unless $opts.quiet f.puts headers begin gateway = IPSocket::getaddress("gw-kavlan-"+vlan.to_s+"."+mysite+".grid5000.fr") rescue puts "WARN: Get address error: probably no kavlan DNS setup for site " + mysite + " , skip" if $opts.verbose; next end # /20 for local vlans (1..3) and /18 for routed vlan (4..9) if vlan < 4 ip = IP.new(gateway+"/20") ns = gateway ntp = gateway tftp = gateway else ip = IP.new(gateway+"/18") ntp = IPSocket::getaddress("ntp."+mysite+".grid5000.fr") ns = IPSocket::getaddress("dns."+mysite+".grid5000.fr") tftp = IPSocket::getaddress("kadeploy-server."+mysite+".grid5000.fr") end netmask = ip.netmask.to_addr broadcast = ip.broadcast.to_addr network = ip.network.to_addr f.puts "subnet %s netmask %s {" % [network , netmask] f.puts " default-lease-time 86400; max-lease-time 604800;" #f.puts " option domain-name \"%s.grid5000.fr\"; " % mysite f.puts " option domain-name-servers %s;" % ns f.puts " option ntp-servers %s; " % ntp f.puts " option routers %s;" % gateway f.puts " option subnet-mask %s; " % netmask f.puts " option broadcast-address %s;" % broadcast f.puts " filename \"pxelinux.0\";" f.puts " next-server %s;" % tftp sites_for_vlan = if vlan < 10 [ site ] else root.sites(options) end sites_for_vlan.each do |currentsite| currentsite.clusters(options).each do |cluster| cluster.nodes(options).each do |node| sitename=currentsite['uid'] device = node['network_adapters'].find{|s| s['network_address'] =~ /^\w+-\d+\.\w+\.grid5000\.fr/} next if device.nil? hostname = device['network_address'] next if hostname.nil? hostname_vlan = hostname.gsub(/^(\w+-\d+)(\..*)$/){$1+"-kavlan-"+vlan.to_s+$2} shortname_vlan = hostname_vlan.gsub(/^(\w+-\d+-\w+-\d+)(\..*)$/){$1} shortname = hostname.gsub(/^(\w+-\d+)(\..*)$/){$1} realsite = hostname.split(".")[1] begin vlan_ip = IPSocket::getaddress(hostname_vlan) rescue puts "WARN: Get address error: probably no DNS setup for vlan " +vlan.to_s+" on site " + sitename + " , skip" if $opts.verbose; next end if device['mac'].nil? then if ref[shortname].nil? then puts "WARN: mac undefined for host %s, skip" % hostname unless $opts.quiet next else mac = ref[shortname]['mac_eth0'] end else mac = device['mac'] end f.puts " host %s {" % hostname_vlan f.puts " hardware ethernet %s; " % mac f.puts " option host-name \"%s\";" % shortname_vlan f.puts " option domain-name \"%s\.grid5000.fr\";" % realsite f.puts " fixed-address %s;" % vlan_ip if vlan > 9 # for global vlan, we need the local tftp server currenttftp = IPSocket::getaddress("kadeploy-server."+sitename+".grid5000.fr") f.puts " next-server %s;" % currenttftp end f.puts " }" end end end f.puts "}" end end end end
(this script use restfully and ruby-ip gems)
Then, generate the configuration (replace VLANID and SITE by your current site and VLAN id), and copy it on the node:
For user accounts, you need to specify your GEM_HOME directory because in the classical one, you won't be able to install the "ruby-ip". To make it possible, type :
You have to disable the default DHCP server of the VLAN:
On the frontend
Now you have to install a DHCP server on the node (we assume the node is not yet in the job VLAN, or the vlan is routed and have access to internet for apt):
There may be an error after the installation : It's normal, you need to tell the DHCP server on which interface to listen to DHCP requests (replace "eno1" with the name of the interface on which the server should listen)
You can now copy the generated configuration file and start the DHCP server :
Then, in another shell, connect as root on a second node (or use kaconsole):
And restart the network configuration:
... DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6 [ 5185.656817] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON [ 5185.670596] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 DHCPOFFER from 10.32.3.6 DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from 10.32.3.6 Stopping NTP server: ntpd. Starting NTP server: ntpd. bound to 10.32.3.7 -- renewal in 37174 seconds.
on the dhcp server, check the logs:
... Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPDISCOVER from 00:26:b9:3f:43:a1 via eno1 Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPOFFER on 10.32.3.7 to 00:26:b9:3f:43:a1 via eno1 Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPREQUEST for 10.32.3.7 (10.32.3.6) from 00:26:b9:3f:43:a1 via eno1 Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPACK on 10.32.3.7 to 00:26:b9:3f:43:a1 via eno1
In the four last lines, you see that your own dhcp server has given an address to the other node.
DHCP and PXE
For your information, if you need to do a PXE boot, you must change the tftp server in the generated dhcpd configuration file:
(if there is no next-server configured, you must edit the file by hand and add a line like this:
next-server XX.XX.XX.XX ;
where XX.XX.XX.XX is the IP of your node (echo $IP).
Change the VLAN of your nodes manually
Put your nodes into the reserved VLAN
If you really want to change the VLAN manually, you can, but it's much simpler to change the vlan with kadeploy.
In order to change the VLAN of the nodes manually, you must  reconfigure the network after the vlan has changed; but once the VLAN has changed, you can't connect to the node! An easy way to do this is to use the 'at' command (apt-get install at if it's not installed in your nodes)
We will use Taktuk to start remote commands on several nodes at once. In this example, we will use all the nodes. Since taktuk does not handle duplicate names in the nodefile, we must first remove duplicates.
First, we will use taktuk to install at on all nodes, then the taktuk command will simply launch the network reconfiguration in one minute. Finally, we set the VLAN of all our nodes.
As we will change the network configuration of nodes, we will use an isolated kavlan (a.k.a. kavlan-local) to not interfer with the rest of Grid'5000 network.
|   | frontend: | taktuk-s -l root -f $OAR_FILE_NODES broadcast exec [ "apt-get update; apt-get --yes install at" ] | 
|   | frontend: | taktuk-s -l root -f $OAR_FILE_NODES broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ] | 
All nodes are configured in the vlan 2. In one minute, your nodes will renegotiate their IP addresses and will be available inside the VLAN, you can connect to each of them using kaconsole or ssh (as we use a kavlan-local, you must connect to the gateway of that kavlan first):
You can use the ip neigh command to see the known hosts in your LAN; you should only see IPs in the 192.168.66.0/24 subnet
192.168.66.250 dev eth0 INCOMPLETE 192.168.66.254 dev eth0 lladdr 02:00:00:00:01:02 REACHABLE
You should be able to ping another of your host inside your VLAN
64 bytes from 192.168.211.42: icmp_req=1 ttl=64 time=0.141 ms 64 bytes from 192.168.211.42: icmp_req=2 ttl=64 time=0.166 ms 64 bytes from 192.168.211.42: icmp_req=3 ttl=64 time=0.165 ms --- suno-42-kavlan-2.sophia.grid5000.fr ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1998ms rtt min/avg/max/mdev = 0.141/0.157/0.166/0.015 ms
Put your nodes back into the default VLAN
First, get put the list of your nodes name with vlan in a file:
Don't forget to first start the network restarting command with taktuk, but this time, we need to run taktuk from the kavlan gateway, and not the frontend :
|   | kavlan-VLANID: | taktuk-s -l root -f ./mynodes-vlan broadcast exec [ "echo '/etc/init.d/networking restart' |  at now + 1 minute " ] | 
Then you can put your nodes back in the default VLAN:
You should be able to ping your nodes:
Another way to put back nodes into the default VLAN is to change the vlan and then kareboot the nodes.
Other usage
Using the API
Kavlan is also available through the API. Using the job and deploy API, you can, as with the command line tools, reverve nodes with vlan and deploy nodes into a vlan. If you want to manipulate VLAN directly through the API, you can do several things:
You can get the vlans you have reserved:
GET https://api.grid5000.fr/stable/sites/SITE/vlans/users/USER
You can get all the vlans available on a site:
GET https://api.grid5000.fr/stable/sites/SITE/vlans/
You can get the VLAN of nodes on a site:
GET https://api.grid5000.fr/stable/sites/SITE/vlans/nodes
You can print the VLAN of a list of nodes
POST https://api.grid5000.fr/stable/sites/SITE/vlans/nodes {\"nodes\": [ <list of node names>]}
You can change the VLAN of a list of nodes:
POST https://api.grid5000.fr/stable/sites/SITE/vlans/VLANID/ {\"nodes\": [ <list of node names>]}
You can start the dhcp server for the vlan
PUT https://api.grid5000.fr/stable/sites/SITE/vlans/VLANID/dhcpd {"action":"start"}
You can stop the dhcp server for the vlan
PUT https://api.grid5000.fr/stable/sites/SITE/vlans/VLANID/dhcpd {"action":"stop"}
Use a global VLAN
With a global VLAN, you can put nodes from several sites in the same VLAN
First reserve a global vlan on one site (here sophia) and 2 nodes on lille,sophia and lyon:
|   | frontend: | oargridsub-t deploy -w 2:00:00 sophia:rdef="{\\\\\\\"type='kavlan-global'\\\\\\\"}/vlan=1+/nodes=2",lille:rdef=/nodes=2,lyon:rdef=/nodes=2 > oargrid.out | 
Get the oargrid Id and Job key from the output of oargridsub:
|   | frontend: | exportOARGRID_JOB_ID=`grep "Grid reservation id" oargrid.out | cut -f2 -d= | cut -d ' ' -f2` | 
Get the node list using oargridstat:
Then use kadeploy3 to deploy your image on all sites and change the VLAN:
|   | frontend: | kadeploy3-f gridnodes -a http://public.sophia.grid5000.fr/~nniclausse/openmx.dsc -k --multi-server -o ~/nodes.deployed --vlan18 | 
If you want to manipulate directly VLAN of a node, you have to run the kavlan command on the site where the node is, e.g. if you have reserved the global vlan located at sophia and want to put some nodes of lille into this vlan, you have to run kavlan -m nodename -i VLAN_GLOBAL_ID -s on lille site (or use the API with lille site in the URL).
How to use a local VLAN
In this section, we will describe the specificity of the local VLANs.
If you want to use local VLAN, you have to first connect on the gateway of the vlan. For this, once you have a running reservation on a local VLAN, you have a ssh accces to the gateway:
Then you can reach your nodes inside the VLAN. Another option is to use the kaconsole command.
(You can still use kadeploy to put your nodes in the VLAN in one step.)
Configure ssh to easily connect to nodes in a local VLAN
You can configure ssh to make the connection through the gateway transparent:
In order to transparently use ssh to acces to isolated nodes (local VLAN), you should add this to your .ssh/config file on the frontend:
Host *-*-kavlan-1 *-*-kavlan-1.*.grid5000.fr
    ProxyCommand ssh -a -x kavlan-1 nc -q 0 %h %p
Host *-*-kavlan-2 *-*-kavlan-2.*.grid5000.fr
    ProxyCommand ssh -a -x kavlan-2 nc -q 0 %h %p
Host *-*-kavlan-3 *-*-kavlan-3.*.grid5000.fr
    ProxyCommand ssh -a -x kavlan-3 nc -q 0 %h %p
Then you can simply use ssh <cluster>-<nodeid>-kavlan-<vlanid> to access the node , for example: