G5k-checks

From Grid5000
Jump to navigation Jump to search


Get the sources

Download release archives at https://gforge.inria.fr/frs/?group_id=150

Get the latest development from the Grid'5000 Subversion repository:

  • via SSH,
svn checkout svn+ssh://<developer username>@scm.gforge.inria.fr/svn/grid5000/admin/trunk/g5k-checks
  • via DAV,
svn checkout --username <developer username> https://scm.gforge.inria.fr/svn/grid5000/admin/trunk/g5k-checks

Description

g5k-checks

  • g5k-checks is expected to be integrated into the production environment of the Grid'5000 computational nodes. It gathers a collection of programs which check that a node meets several basic requirements before it declares itself as available to the OAR server.
  • This lets the admins enable some checkers which may be very specific to the hardware of a cluster.

g5k-checks executes at boot time in two phases:

  • Phase 1
    • An init script, /etc/init.d/g5k-checks, runs all checkers that must run early enough in the boot process.
    • They are listed in the variable CHECKS_FOR_INIT in the configuration file.
    • Then it enables all checkers listed in the variable CHECKS_FOR_OAR for Phase 2.
  • Phase 2
    • This phase strongly relies on the check mechanism provided by OAR and the oar-node configuration file (/etc/default/oar-node for deb distros, /etc/sysconfig/oar-node for rpm ones).
    • The oar-node flavour of OAR installation embeds an hourly cron job, /usr/lib/oar/oarnodecheckrun, which runs all executable files stored in /etc/oar/check.d/. Then the server periodically invokes remotely /usr/bin/oarnodecheckquery. This command returns with status 0 if there is some files in /var/lib/oar/check.d and 0 otherwise. So if a checker in /etc/oar/check.d/ finds something wrong, it simply has to create a log file in that directory.
    • The version of /etc/(default|sysconfig)/oar-node that g5k-checks installs runs both oarnodecheckrun and oarnodecheckquery scripts. If the latter fails, then the node is not ready to start, and it loops on running those scripts until either oarnodecheckquery returns 0 or a timeout is reached. If the timeout is reached, then it does not attempt to declare the node as "Alive".
    • During Phase 1, the enabling of a checker simply turns out to adding a symbolic link in /etc/oar/check.d to its "oar-node driver". We name this a short script file which interfaces the core checker to the OAR check mechanism.

At any moment when the node is running g5k-checks may be called either to disable or to enable checks. This is expected to be used by OAR prologue and epilogue:

  • /etc/init.d/g5k-checks stop: disable OAR checks
  • /etc/init.d/g5k-checks start: enable OAR checks for oarnodecheckrun
  • /etc/init.d/g5k-checks startrun: enable OAR checks for oarnodecheckrun and run once the couple oarnodecheckrun/oarnodecheckquery, without waiting for one hour to be passed.

Basically, the OAR prologue should call "stop", while the epilogue should call "startrun".

  • At installation time, g5k-checks configures the local syslog daemon: it first looks for a free <n> such as local<n>.alert and local<n>.warning selectors are not used, and then defines them with the action "@syslog". If there is no "syslog" host on the local network, then it defaults to writing messages in the local syslog file.
  • The checkers use the local<n> facility to report only important messages. They use a local log file for debugging messages. Please see section 3.2 for further details.

g5k-parts

g5k-parts is designed to run at both phases of g5k-checks (see above).

  • In Phase 1, g5k-parts validates the partitioning of a Grid'5000 computational node against the G5K Node Storage convention: all partitions but /tmp are primary, and /tmp is a logical partition inside the only extended partition.
  • It first compares /etc/fstab with its backup generated at deployment time. When errors are found at this level, /etc/fstab is reset and the machine reboots.
  • Then for every partition given on the command line, it first matches its geometry on the hard drive with the partition layout saved at deployment time. It may perform several other checks (e.g. the formatting of the partition) depending on the partition role. It attempts to fix errors, which it reports to the syslog system. Sometimes, g5k-parts cannot fix the error (e.g. hard drive errors); it can only prevent the node from declaring itself as alive to the OAR server, with a simple stamp file (not executable !) whose existence is tested by the oar-node driver of g5k-parts in Phase 2:

/etc/oar/check.d/g5k-parts-init-failed

  • There is a special processing for NFS shares which the node must mount at boot time. Sometimes such mounts fail and this is not due to the node itself but to the NFS server(s) or the network connection. To avoid blocking the boot process (and have kadeploy3 fail because of a timeout when it should not), g5k-parts tries every mount only once. Subsequent tries are done in Phase 2.
  • In Phase 2, the oar-node driver of g5k-parts calls the script with a single argument, "nfs", in order to limit the checks to the NFS shares.

g5k-proc

  • g5k-proc aims at verifying if the characteristics of the processor model and the processor frequency are those expected.
  • To perform this test, g5k-proc is based on g5k-api-lib.rb which parses the reference API to retrieve information about the processor model and the processor frequency.
  • Then, it compares the retrieved information to the contents of /proc/cpuinfo.

g5k-ethernet-link

  • g5k-ethernet-link checks whereas the ethernet link has been correctly negotiated or not.
  • Like g5k-proc, it parses the reference API and retrieves informations about the Ethernet devices such as the rate.
  • Then, it makes a comparison between the retrieved value and the output of ethtool.
  • This checker could be enhanced to manage the InfiniBand, Myrinet or still the second ethernet interface if this one is configured in the postinstall of the production environment.

Installation

Production environment

Dependencies

  • deb distributions
    • ethtool
    • rubygems + run "gem install rest-client"
  • rpm distributions
    • ethtool
    • rubygem-mime-types
    • rubygem-rest-client

Check that the variables defined in Makefile.defs do match your needs.

make DISTRIB=(rpm|deb) install

Check that the syslog configuration which was generated is correct.

Postinstalls of the Production Environment

Please read carefully the instructions printed out by the make command above.

Adding a new Check

In this section we assume that you want to add a check named "new-check". First create the new-check/ directory at the root of the source tree.

Simple Case

In the "simple" case, there are only three files involved:

  • bin/new-check, the core checker
  • check.d/new-check, the oar-node driver
  • Makefile:
topdir = ..
CHECK = new-check
include $(topdir)/Makefile.check

The core checker is the core part of the check. It can be a bash or a ruby script, but if it needs information from the G5K API, then you had better develop it in ruby, for the ruby library g5k-api-lib.rb provides all facilities to retrieve them.

The core checker MUST use the logging functions provided by the bash and ruby libraries with the conventions listed in section 3.2.

The oar-node driver interfaces the call to the check with the oar-node check mechanism. It is called without any argument but with an environment variable set: CHECKLOGFILE. It is the path of a file in the directory that oarnodecheckquery will search into. The oar-node driver should create it if the check fails. Just look at some examples in the check.d/ directories of existing checks.

Logging

The core checker MUST use the logging functions provided by g5k-api-lib.rb and libg5kchecks.bash, with the following convention:

  • log A, "msg": send msg as an alert to the syslog system
  • log W, "msg": send msg as a warning to the syslog system
  • log I, "msg": send msg to a local log file, /var/log/g5k-checks.log

$LOG may be used to redirect the stdout and/or stderr of some commands.

  • The alert level must be used for errors which prevent the node from being qualified for OAR reservation. An admin must have a look.
  • The warning level must be used when the error is severe enough for admins to be interested in having a look at it, but which does not disqualify the node for OAR reservation: typically, when the checker can fix it.
  • Alert and Warning level messages should not be longer than one-line, as far as possible.
  • The information level is intended to give a complement of information to the admin who would have a look: be free to log any kind of information there. At this level, you can use any of the following methods:
    • through the function: log I "msg"
    • directly into $LOG: e.g. >> ${LOG} 2>&1

BEWARE always to APPEND text, and NOT ERASE preceding information !!!

Finally, note that your checker may be called every hour by the oar-node check mechanism, so try to make it as silent as possible when it is run from the oar-node driver...

Complex Case

Some checks may need to run earlier in the boot sequence than oar-node. Just like for the oar-node driver of the simple case, you must develop an "init" driver and move it to new-check/check.d/new-check. /etc/check.d/g5k-checks will call this driver with no argument before it enables the oar-node checks (see 1.1) For instance, g5k-parts needs both an init and an oar-node driver.

Some checks may need more data than those set in the global g5k-checks configuration file. Those data shall be stored in $G5KCHECKS_HOME/data/ after installation. Their installation (and their generation if needed) shall be managed by new-check/Makefile. The Makefile of the simple case will not fit, of course. The Makefile you will develop shall meet the only requirement that the following targets must be defined (even if no command is associated):

         install:
         install-msg:
         clean:
         uninstall:
         uninstall-msg:

Once again, g5k-parts is a good example: its Makefile is based on the simple case Makefile, but add requisites to the compulsory targets... Note that it makes a full usage of the variables defined in Makefile.defs too.

Librairies

  • For checks that need to retrieve the configuration of the nodes, the script /lib/g5k-api-lib.rb establishes a connection to the api-proxy and load this configuration in the file /data/node_configuration.
    • When this file ever exists, there is no need to draw this connection. Then, the file will be read and no connection will be performed.
    • As following, the connection to the api-proxy will only be done while deploying the production environment.
    • If the API-proxy is not reachable, /data/node_condiguration will be used
    • If the API-proxy is not reachable and this file does not exist, the node declares itself as suspected

If you find some code duplicated in several checkers, then feel free to move it to the lib/ directory, either in the existing files if it makes sense, or in a new library. Only make sure the checker will load it...

Discussions @CT-60

(and later...)

The feed-back from the team at CT-60 and some further discussions with B. Bzeznik led to some concrete propositions. The decisions which remain to be made are in italic.

Fail.png: still waiting for a decision. InProgress.png: decision made, implementation in progress. Check.png: decision made and implementation done in SVN repository.
Check.png oarnodecheckrun should not rely on cpusets to find running jobs
FALSE even with -t allow_classic_ssh the cpusets are defined: it is performed by the job_resource_manager, not by the ssh connection.
Check.png The node should not be suspected if the API server is not available.
Pascal M. said it was too constraining to have the API server running when deploying the nodes. It is already fixed in the SVN repository: a simple warning is issued instead of an alert, and the check is skipped, which needed this.
Check.png The API branch sould be configurable.
Check.png The HTTP cache mechanisms should be used to retrieve the node configuration from the API server.
Philippe R. is in charge of that (with Cyril R.'s help ?) The current implementation relies on the local copy existence only, with no consideration of expiration date.
Check.png g5k-checks.conf should be managed by the admin postinstall.
This is a bit annoying, because installing g5k-checks already requires modifications of the environment postinstall (for g5k-parts). But it has been decided that the environment postinstall should not depend on the site. g5k-checks.conf sets the list of checks to be run, which may depend on the cluster. Of course, we g5k-checks developers will ensure that the default installed version of g5k-checks.conf is one suitable for most of clusters. But "exotic" clusters will still require a customized g5k-checks.conf in their admin postinstall (only for production environments !!!)
Check.png /tmp remounted RO
g5k-parts tackles this issue at boot time. It now check for it (actually all tests about /tmp are performed) along with the periodical checks for NFS mounts.
Fail.png oarnodecheckrun should set a timeout on the run of every check
Not that easy in bash, but we found a solution. Now there still remain a few things to decide.
What if a check times out ? David M. thinks oarnodecheckrun should kill what it can and report the error. Philippe C. thinks only one operation should be performed: the creation of the checklog file, which tags the node as suspectable for oarnodecheckquery. There should be no attempt to kill the check process, and no attempt to run the next checks. Reasons for that are:
  1. oarnodecheckrun does not know if any fork or exec has been performed by the checker, and it could be a mess to correctly kill it.
  2. the check could still succeed in the end: the OAR driver of the check would then remove all checklog files, and the next oarnodecheckquery execution would succeed, setting the node back to alive.
  3. we do not want checker developers to bother about possible concurrent execution with another check.
The control of the timeout will be given to oarnodecheckrun. The OAR drivers may define a TIMEOUT variable, which oarnodecheckrun would take as a replacement of its default value, 30s.
InProgress.png The pingchecker at job end is called for deploy jobs too, which have the nodes systematically suspected.
Bruno B. will ensure the pingchecker is not called on Absent or scheduled-to-be-Absent nodes. Since the frontend epilogue requests for the job nodes to be set Absent, the job-end-pingchecker will not be called on them. This is fixed in OAR SVN repository (see the bug entry in the OAR bug tracker system).
InProgress.png Problem of the pingchecker timeout.
In the current implementation, oarnodecheckquery would launch the checks if they had not been run for too long. This may happen at job end actually. The problem is that it requires the checks to run within the pingchecker timeout, or we would need a different timeout for job end pingchecker, or even better, another pingchecker for job end. These solutions sound like too tricky, too G5K-specific to OAR developers ears. In the meanwhile, we realized that it was eventually possible (and reliable) to run the checks in the job epilogue on the first node of the job. So the new scheme will be the following:
  1. oarnodecheckquery is reset to its simple form, only checking for any checklog file existence. It is the OAR pingchecker, called every 5 minutes AND at the end of non-deploy jobs.
  2. oarnodecheckrun is still launched hourly by cron and run the checks if only no job at all is running on the node.
  3. The OAR epilogue on the first node will "taktuk" oarnodecheckrun <JOBID> <JOBUSER> on all nodes of the reservation, then exit 0 to avoid setting the job state as ERROR. If some checks fail on some nodes, then the job end pingchecker will suspect those nodes and not all nodes of the job. The additional arguments to oarnodecheckrun have it accept running the checks if the specified job is the only one remaining on the node.
This solution gets rid of any stamp file.
Discussions about the framework
We g5k-checks developers think RSpec is not well suited for that software, it looks more designed for software regression/test suites. Furthermore, we discovered this solution a bit late, when g5k-checks current framework was already set. But this is no good reason, of course.
The RSpec framework could suit the hourly checks, but it does not tackle all aspects of g5k-checks, which may run checks at boot time, just like g5k-parts.
We found that solution tough for Ruby newbies like us. It is quite constraining and makes it difficult to integrate non-Ruby checkers (like g5k-parts again). The main idea of g5k-checks is that people may develop some tests on their own. Then, when their test programs are mature enough, they could be easily integrated into g5k-checks, as a standalone checker. The RSpec framework would discourage this, to our opinion.
Finally, we have been charged of "reinventing the wheel". But g5k-checks MUST interact with OAR. It would definetely have been "reinventing the wheel" if we had ignored the oarnodecheck mechanism. Then, the oarnodecheck mechanism is already a test framework as a whole. So what ? Should have we used a framework inside another framework ?
Yes, the RSpec framework is very attractive, but we must be careful not to fall into this common trap of getting things more complex for the sake of simplicity.