CLCAR 2009 OAR Tutorial: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
Line 31: Line 31:


Once done, we unpack our experiment data:
Once done, we unpack our experiment data:
{{Term|location=frontend|cmd=<code class="command">tar</code> xvzf ~/runhpl.tgz -C ~/}}
{{Term|location=frontend|cmd=<code class="command">tar xvzf ~/runhpl.tgz -C ~/}}


A directory called <code class="dir">runhpl</code> is now visible in our home directory. This directory will be available on every compute node of this site, thanks to a <code>NFS</code>-mount of user home directories.
A directory called <code class="dir">runhpl</code> is now visible in our home directory. This directory will be available on every compute node of this site, thanks to a <code>NFS</code>-mount of user home directories.


{{Warning|text=During the execution of programs that generate of lot of disk writing, it is a good idea to do these writing on each node local disk and not on your networkly-shared home directory. This way, you will avoid a lot of NFS troubles, like lags or breakdowns. As the <code>NFS</code> service is provided by one service node and is shared among every users and every compute nodes, its performance can change independently from your experiment. Be sure to avoid measurements for your experiments that depend on a shared nfs server's performance !}}
{{Warning|text=During the execution of programs that generate of lot of disk writing, it is a good idea to do these writing on each node local disk and not on your networkly-shared home directory. This way, you will avoid a lot of NFS troubles, like lags or breakdowns. As the <code>NFS</code> service is provided by one service node and is shared among every users and every compute nodes, its performance can change independently from your experiment. Be sure to avoid measurements for your experiments that depend on a shared nfs server's performance !}}


= Interactive run =
= Interactive run =

Revision as of 04:25, 23 September 2009

This practice is about running jobs on a cluster. You will learn how to access a Grid'5000 cluster, how to install your data and how to run your jobs and visualize them.

Access cluster

Enter site

On Grid'5000, you can only submit your jobs when you are inside a site. So you must connect to one of the sites of the grid.

Once you know where you want to run your job, the command to connect to the site is:
(do not forget to replace login with your Grid'5000 login and site by the chosen site, for example nancy)

Terminal.png outside:
ssh login@access.site.grid5000.fr

Then you have to connect you to the frontend machine to make OAR reservations

Terminal.png accessMachine:
ssh frontend

Note that in some sites, access machine and frontend are the same server.

Prepare the experiment environment

An experiment is often composed of running codes and their configuration. The experiments of this practice are based on the HPLinpack benchmark. This benchmark quantifies the number of floating operations per second during the solving of a random dense linear system.

The programs to run and their configuration are available outside of Grid'5000 infrastructures. Thus, we need to retrieve those external data an install them on the cluster where we are currently connected.

Finally we prepare the execution environment of the experiment.

Setup data on the cluster

We retrieve our data on the frontend.

The experiment data are stored on a specific directory on grid5000 frontend clusters. Data retrieval can be performed on the frontal using the following command:

Terminal.png frontend:
cp /home/grenoble/ygeorgiou/runhpl.tgz .

Once done, we unpack our experiment data:

Terminal.png frontend:
tar xvzf ~/runhpl.tgz -C ~/

A directory called runhpl is now visible in our home directory. This directory will be available on every compute node of this site, thanks to a NFS-mount of user home directories.

Warning.png Warning

During the execution of programs that generate of lot of disk writing, it is a good idea to do these writing on each node local disk and not on your networkly-shared home directory. This way, you will avoid a lot of NFS troubles, like lags or breakdowns. As the NFS service is provided by one service node and is shared among every users and every compute nodes, its performance can change independently from your experiment. Be sure to avoid measurements for your experiments that depend on a shared nfs server's performance !

Interactive run

OAR2, the Grid'5000 batch scheduler, has an interactive mode. This mode connects the user to the first of his allocated nodes.

Submission

Submit an interactive job:

Terminal.png frontend:
oarsub -I

OAR2 returns a number. This number identify our submission on the various visualization interfaces:

OAR_JOB_ID=8670 

You were automatically connected on the first node that is allocated to your job. OAR2 initialize several environment variables set to reflect current submission. These variables can be used by your script to adapt to current submission properties:

Terminal.png node:
env

Especially the list of your dedicated nodes can be viewed:

Terminal.png node:
cat $OAR_NODEFILE
Note.png Note

Depending on the site where you made the submission, the same node name may appear more than one time inside the $OAR_NODEFILE file. OAR2 distinguishes node CPUs and cores, leading to a node name being printed in $OAR_NODEFILE for each CPU and core of the node. Hence each line actually matches a core.

It is time to run our script:

Terminal.png node:
~/runhpl/runhpl

Results is going to be printed on the standard output. Remember that we quantifies the number of floating operation per second.

Visualization

Submission is visible on the Monika web interface of the site where it was submitted:

Cluster status cannot be obtained in command-line from the node where you were connected by OAR. You need another terminal connected to the oar machine of the site or cluster (do not forget to replace OAR_JOB_ID by your current submission id):

Terminal.png frontend:
oarstat -f -j OAR_JOB_ID

Ending

With interactive submission, submission ending is not related to the end of your script run. It depends on the connection to the submission main node that OAR automatically made for you. Thus you can run as many scripts as you want before job deadline.

Warning.png Warning

Your current submission uses a default walltime of 1 hours. If you are still connected to the submission nodes after 1 hours, OAR2 will automatically disconnect you.

By quitting the shell opened for you by OAR upon the oarsub command (e.g. shell on the first node of your job), your will terminate the job. Thus, as we are finished with our job here, you can end it by typing type:

Terminal.png node:
exit

Web interfaces and command-line tools reflect submission ending:


Visualize cluster

Experiment data are ready, before running the job, we are now going to analyze cluster state, which can be visualized in many ways.

oarstat

oarstat is a command-line tool to view current or planned job submission.

View each submissions:

Terminal.png frontend:
oarstat

View each submission details:

Terminal.png frontend:
oarstat -f

View a specific submission details

Terminal.png frontend:
oarstat -f -j OAR_JOB_ID

View each submissions from a given user: View the status of a specified job:

Terminal.png frontend:
oarstat -s -j OAR_JOB_ID

View each submissions from a given user:

Terminal.png frontend:
oarstat -u login

oarnodes

oarnodes is also a command-line tool. It shows cluster node properties:

Terminal.png frontend:
oarnodes

Among returned information there is current node state. This state is generally Alive or Absent. When nodes are sick, their state is Suspected or Dead.

oarprint

oarprint is a tool providing a pretty print of a job resources. The command prints a sorted output of the resources of a job with regard to a key property, with a customisable format.
On a job connection node (where $OAR_RESOURCE_PROPERTIES_FILE is defined):

Terminal.png node:
oarprint host -P host,cpu,core -F "host: % cpu: % core: %" -C+

On the submission frontend:

Terminal.png frontend:
oarstat -j OAR_JOB_ID -p | oarprint core -P host,cpuset,memcore -F "%[%] (%)" -f -

Monika

Monika is a web interface which in a way synthesizes information given by oarstat and oarnodes. Current node state is displayed and the current and planned submission list is printed at the bottom of the web page.

Starting from the site list web page, click on the site or the cluster where your are currently connected to view its current state:

DrawOARGantt

Monika displays current node status and current and planned job submissions. DrawOARGantt is also a web interface, but it prints past, current and planned node states on a temporal diagram.

Starting from the site list web page, click on the site or the cluster where your are currently connected to briefly manipulate the web interface options:

Ganglia

Node load, memory usage, cpu usage and so on are available with the Ganglia web interface.

Starting from this interface home page, click on the site or the cluster where your are currently connected to quickly analyze available metrics and diagrams:

Note.png Note

helpdesk.grid5000.fr is another Grid'5000 community website where you also have to log in with your Grid'5000 account information.

By default Ganglia displays current metrics values but you can view an abstract of these values over the past year.

Passive run

OAR2, Grid'5000 batch scheduler, has a passive mode. In this mode, a script specified at submission time is run on the main dedicated node. This script must know about dedicated nodes to split its work between them.

Submission

Submit a passive job with our script:

Terminal.png frontend:
oarsub ~/runhpl/runhpl
Note.png Note

Environment variables, described during our interactive submission, are also initialized by OAR for passive jobs.

Visualization

Submission is visible on the Monika web interface of this site where it was submitted:

You can also view current cluster status with command-line tool (do not forget to replace OAR_JOB_ID by your current submission id):

Terminal.png frontend:
oarstat -f -j OAR_JOB_ID

Sometime it is useful to know when a passive job starts. It is possible to do that by using the "--state" or "-s" option of oarstat. This option is optimized to get the status of a specified job faster than with "-f" option. It is especially designed for a scripting usage.

Terminal.png frontend:
oarstat -s -j OAR_JOB_ID

The following script launchs in a passive job the runhpl program and waits until the job starts.
launchJob.sh :

#!/bin/bash

my_script="~/runhpl/runhpl"

oar_job_id=`oarsub $my_script | grep "OAR_JOB_ID" | cut -d '=' -f2`

oar_stdout_file="OAR.$oar_job_id.stdout"

until oarstat -s -j $oar_job_id | grep Running ; do
    echo "Waiting for the job to start..."
    sleep 1
done

echo "Job $oar_job_id is started !"

To analyze standard output stdout and standard error output stderr of the main node, where our script is run, OAR puts theses output in files named OAR.OAR_JOB_ID.stdname. Depending on your submission id, you should see in your home directory:

OAR.OAR_JOB_ID.stdout
OAR.OAR_JOB_ID.stderr

Thus you can follow your job's run in live:

Terminal.png frontend:
tail -f OAR.OAR_JOB_ID.stdout

To quit job monitoring, please press Ctrl + c.

You can specify the files that will store the standard output and error stream of the job by using the -O and -E option in the oarsub command. For example it's possible to redirect the oarsub output in /dev/null or in other file

Terminal.png frontend:
oarsub ~/runhpl/runhpl -O /dev/null
Terminal.png frontend:
oarsub ~/runhpl/runhpl -O ~/runhpl.log

Ending

OAR submission automatically ends when script ends. You can see this on the Monika web interface of the site where the script was submitted. Job's dedicated nodes returned in their available state:

Submission left its print on the DrawGantt web interface of its host site:

Submission has also gone from command-line status:

Terminal.png frontend:
oarstat

Our job's results are available at the end of standard output file:

Terminal.png frontend:
tail OAR.OAR_JOB_ID.stdout

Node number specification

Until now our submissions used the default node number: 1 node.

We are going now to submit an interactive job on 2 nodes:

Terminal.png frontend:
oarsub -I -l nodes=2

We are automatically connected to one of these 2 nodes due to interactive submission. We can learn about our dedicated resources:

Terminal.png node:
cat $OAR_NODEFILE

As you can read, the benchmark script detects available node number and can adapt to 1, 2 and 4 nodes:

Terminal.png node:
~/runhpl/runhpl

We can verify that the run occurs on the other node with another terminal (do not forget to replace OAR_JOB_ID by the current job identifier and hostname by the allocated node name):

Terminal.png frontend:
export OAR_JOB_ID=OAR_JOB_ID
Terminal.png frontend:
oarsh hostname ps -C xhpl.x86_64

Planning

Until now our submissions used the default start time and the default duration: immediate start and 1 hour duration. We are going to submit jobs with a specific duration and a delayed start.

Let us run the script on December 7th, 2008 at 3:30pm for a 10-minute duration:

Terminal.png frontend:
oarsub -r '2008-12-07 15:30:00' -l nodes=2,walltime=0:10:00 ~/runhpl/runhpl
Note.png Note

You can submit your job without specifying script when you do delayed submission. In this case, allocated nodes are waiting for job's owner activity.

The delayed submission appears as Scheduled on the Monika web interface of the site where it was submitted or on the command-line (do not forget to replace OAR_JOB_ID by your submission id):

Terminal.png frontend:
oarstat -f -j OAR_JOB_ID

When it is time for submission to begin, you can connect to its main node to interactively run the script or monitor its run:

Terminal.png frontend:
oarsub -C OAR_JOB_ID -I

Submission ending does not occur when you disconnect from its main node, even if you omit specifying a script to run. Ending occurs when specified script end or when duration deadline expire. If you do not specify a script to run and you finished, it is a good idea to release allocated nodes earlier:

Terminal.png frontend:
oardel OAR_JOB_ID

Pay attention to the fact that your job will now be in the Error state, in this case.

Quick Glossary

Please refer to the following definitions to better understand the different OAR mecanisms:

OAR Grid

OAR Grid usage is presented in the next tutorial : Grid experiment