TakTuk: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
 
(41 intermediate revisions by 9 users not shown)
Line 1: Line 1:
{{Type|Tip}}
{{Portal|User}}
{{Maintainer|Pierre Lemoine}}
{{Portal|Tutorial}}
{{TutorialHeader}}
 
During the course of an experiment on G5k, one often wants to run programs or scripts on all the reserved nodes simultaneously. For instance, to send data to remote workers. This could be achieved sequentially with a loop construct in a script, but this method does not scale well. That's why a launcher for parallel commands named TakTuk has been installed on each G5k site.
During the course of an experiment on G5k, one often wants to run programs or scripts on all the reserved nodes simultaneously. For instance, to send data to remote workers. This could be achieved sequentially with a loop construct in a script, but this method does not scale well. That's why a launcher for parallel commands named TakTuk has been installed on each G5k site.


= Short Explanation about how it work =  
= What is TakTuk =
To speed up the execution of a command on a large number of nodes, TakTuk  
 
* uses a tree to launch programs on all reserved nodes.
TakTuk is a parallel launcher.
* initiates a few connections in parallel (the deployment window) and distributes the remaining deployment work to already
 
deployed peers using an adaptive work-stealing algorithm (Extract of [http://taktuk.gforge.inria.fr/Documents/TakTuk_UserGuide.pdf A guide of TakTuk use] by Guillaume Huard).
To speed up the remote execution of the same command on a large number of nodes, TakTuk both
* uses a deployment tree to initiate several connections to remote peers at the same time
* initiates locally a few connections in parallel (the deployment window)
The balance between the two mechanisms is ensured by an adaptive work-stealing algorithm combined with an on-line evaluation of the local machine load (see [https://taktuk.gitlabpages.inria.fr/Documents/TakTuk_UserGuide.pdf A guide to TakTuk use] by Guillaume Huard for more details).


[[Image:Taktuk_schema.png|500px]]
[[Image:Taktuk_schema.png|500px]]


Therefore, for taktuk to operate correctly, peers should be able to connect to each other. By default, TakTuk use ssh connections. So to check if Taktuk execution will work, follow instructions on this page: [https://www.grid5000.fr/index.php/Public_key_authentication#Test_key_authentication_3 Test key authentication]
Therefore, for TakTuk to operate correctly, peers should be able to connect to each other. By default, TakTuk uses ssh connections. So, to check if TakTuk execution can work, please follow instructions on this page: [https://www.grid5000.fr/mediawiki/index.php/Public_key_authentication Test key authentication].
There is also a practical example for setting up TakTuk in the [https://www.grid5000.fr/mediawiki/index.php/Advanced_OAR#Using_a_parallel_launcher:_taktuk Advanced OAR tutorial].
 
; External Links
 
* [https://taktuk.gitlabpages.inria.fr/ TakTuk Homepage]
* [https://taktuk.gitlabpages.inria.fr/taktuk.html TakTuk Documentation]


= Using TakTuk with the base environment =
= Using TakTuk in Grid'5000 =
== Prerequisites ==
== Using TakTuk with Grid'5000's default environment ==
=== Prerequisites ===


You have a reservation:  
You own a reservation for some machines:  
  {{Term|location=users@frontale.site.grid5000.fr|cmd=<code class="command">oarsub</code> -I -l nodes=4}}
  {{Term|location=users@frontale.site.grid5000.fr|cmd=<code class="command">oarsub</code> -I -l nodes=4}}
Therefore you're on the reserved node:  
You are logged on one of the reserved nodes:  
  {{Term|location=users@node-15.site.grid5000.fr|cmd=<code class="command"></code>}}
  {{Term|location=users@node-15.site.grid5000.fr|cmd=<code class="command"></code>}}
Connections from this node and between nodes with your account are possible
Connections can be made without password from this node and between nodes using your account.


== Using Taktuk to launch scripts ==
=== Using Taktuk to launch scripts ===
To execute scripts simultaneously on all the nodes:
{{Term|location=users@node-15.site.grid5000.fr|cmd=<code class="command">taktuk</code> -c "oarsh" -f $OAR_FILE_NODES broadcast exec [ /home/site/user/testScript.sh ]}}
'''Note''': do not forget to type the characters <code>[</code> and <code>]</code> and separate them from their content (using spaces for instance).


To launch scripts simultaneously on each node:
'''Note''': taktuk uses classic ssh by default. If your job did not reserve all the CPU cores of nodes, you'll need to add  <code>-c "oarsh"</code> to tatktuk command line.
{{Term|location=users@node-15.site.grid5000.fr|cmd=<code class="command">taktuk</code> -f $OAR_FILE_NODES broadcast exec [/home/site/user/testScript.sh]}}
'''Note''': do not forget to type the characters <code>[</code> and <code>]</code>


By default, Taktuk displays a lot of informations but it might not be necessary. So you can add some options on command line to modify correctly the output [c'est pas trés clair pour l'utilité des options connector et status]:
By default, TakTuk displays a lot of informations and part of it might not be necessary. Basically, you can choose exactly what is displayed by TakTuk (type of information and format of the displayed messages) using the output-template option. For instance, the following options suppress the connector and status streams and format the output of commands to use the host name followed by the content of the output:
  -o connector -o status -o output='"$host: $line\n"'
  -o connector -o status -o output='"$host: $line\n"'


Example:
;Example:


To know if time clock are synchronised on each node (Result formated correctly)
To check if time clocks are synchronized on all the nodes (Result formated correctly)
  {{Term|location=users@node-15.site.grid5000.fr|cmd=<code class="command">taktuk</code> -o connector -o status -o output='"$host: $line\n"' -f $OAR_FILE_NODES broadcast exec [date]}}
  {{Term|location=users@node-15.site.grid5000.fr|cmd=<code class="command">taktuk</code> -c "oarsh" -o connector -o status -o output='"$host: $line\n"' -f <( uniq $OAR_FILE_NODES ) broadcast exec [ date ]}}


Output
;Output
  node-101.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
  node-101.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
  node-23.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
  node-23.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
Line 43: Line 56:
  node-97.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
  node-97.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007


To know cpu's installed on nodes:
;To find CPUs installed on nodes:


  plemoine@node-101:$ taktuk -o connector -o status -o output='"$host: $line\n"' -f $OAR_FILE_NODES broadcast exec ["cat /proc/cpuinfo | grep 'model name'"] | uniq
  {{Term|location=user@node-15.site.grid5000.fr|cmd=<code class="command">taktuk</code> <nowiki>-c "oarsh" -o connector -o status -o output='"$host: $line\n"' -f <( uniq $OAR_FILE_NODES ) broadcast exec [ "cat /proc/cpuinfo | grep 'model name'" ] | uniq</nowiki>}}


Output
;Output
  node-101.bordeaux.grid5000.fr: model name      :                  Intel(R) Xeon(TM) CPU 3.00GHz
  node-101.bordeaux.grid5000.fr: model name      :                  Intel(R) Xeon(TM) CPU 3.00GHz
  node-23.bordeaux.grid5000.fr: model name        : AMD Opteron(tm) Processor 248
  node-23.bordeaux.grid5000.fr: model name        : AMD Opteron(tm) Processor 248
Line 53: Line 66:
  node-67.bordeaux.grid5000.fr: model name        :                  Intel(R) Xeon(TM) CPU 3.00GHz
  node-67.bordeaux.grid5000.fr: model name        :                  Intel(R) Xeon(TM) CPU 3.00GHz
  node-97.bordeaux.grid5000.fr: model name        :                  Intel(R) Xeon(TM) CPU 3.00GHz
  node-97.bordeaux.grid5000.fr: model name        :                  Intel(R) Xeon(TM) CPU 3.00GHz
'''Note''': instead of using the option -c "oarsh" in each command, we could type the following command :
{{Term|location=users@node-15.site.grid5000.fr|cmd=<code class="command">export</code> TAKTUK_CONNECTOR=oarsh}}
and then taktuk would know that it needs to use oarsh as a connector.
== Using TakTuk with your own environment but without TakTuk installation==
=== Prerequisites ===
Start by making a reservation:
{{Term|location=users@frontale.site.grid5000.fr|cmd=<code class="command">oarsub</code> -I -l nodes=4 -t deploy}}
Deploy your environment:
{{Term|location=users@frontale.site.grid5000.fr|cmd=<code class="command">kadeploy3</code> -f $OAR_FILE_NODES -e debian11-x64-base -k}}
=== Using Taktuk to launch programs===
Using the -s option, you can use taktuk to deploy on remote hosts that don't have taktuk installed.
{{Term|location=users@frontale.site.grid5000.fr|cmd=<code class="command">taktuk</code>  -s -o connector -o status -o output='"$host: $line\n"' -f <( uniq $OAR_FILE_NODES ) broadcast exec [ "apt-get update" ]}}
To execute an apt-get as root for system update, you have to use the following option: "-l root" before any machine specification.
{{Term|location=users@frontale.site.grid5000.fr|cmd=<code class="command">taktuk</code> -l root  -s -o connector -o status -o output='"$host: $line\n"' -f <( uniq $OAR_FILE_NODES ) broadcast exec [ "apt-get update" ]}}
= Some extra examples =
* To execute a simple command like <code class="command">date</code> on 1 node:
<code class="command">taktuk</code> -m node-5.site.grid5000.fr broadcast exec [ date ]
* Same execution but on more than 1 node:
<code class="command">taktuk</code> -m node-5.site.grid5000.fr -m node-10.site.grid5000.fr broadcast exec [ date ]
* Execute command on nodes given by OAR after reservation
<code class="command">taktuk</code> -f <( uniq $OAR_FILE_NODES ) broadcast exec [ date ]
* Self propagation of TakTuk if it is not available everywhere (works only if you can login from nodes to nodes without typing your password):
<code class="command">taktuk</code> -s -f <( uniq $OAR_FILE_NODES ) broadcast exec [ date ]


= Using TakTuk with your own environment and no taktuk installation on it=
* Connect to nodes with a different login, <code>root</code> for instance:
<code class="command">taktuk</code> -l root -f <( uniq $OAR_FILE_NODES ) broadcast exec [ date ]


== Prerequisites ==
* In order to test latencies between nodes using taktuk:


Starting to make a reservation:  
As a user from the frontend, run:
  {{Term|location=users@frontale.site.grid5000.fr|cmd=<code class="command">oarsub</code> -I -l nodes=4 -q deploy}}
  taktuk -c "ssh -p 6667" -s -o 'state=(($line == 4)?($init_date - $start_date)." s de connection de $host a $peer (dont ".($init_date-$reply_date)." du a TakTuk)\n":"")' -f allnodes.txt b e [ true ] | tee taktuk.log


Deploy environment:
where ''taktuk''  version is at least 3.3 and <code class='file'>allnodes.txt</code> contains the list of the nodes.
{{Term|location=users@frontale.site.grid5000.fr|cmd=<code class="command">kadeploy</code> -p part -f $OAR_FILE_NODES -e sid-x64-nfs-1.0}}


== Using Taktuk to launch programs==
Then doing a <code class='command'>grep connection taktuk.log | sort -n</code> gives:
[Ca ne me dit pas ce qu'il faut faire pour executer sans taktuk, mais seulement comment executer en tant que root : à mon avis, il manque une étape]
0.220579862594604 s de connection de azur-25.sophia.grid5000.fr a sol-12.sophia.grid5000.fr (dont 0.047921895980835 du a TakTuk)
To execute command as root for update , you have to use the following option: "-l root" as the first one used on the command line:
0.220623016357422 s de connection de azur-17.sophia.grid5000.fr a sol-36.sophia.grid5000.fr (dont 0.0476739406585693 du a TakTuk)
{{Term|location=users@frontale.site.grid5000.fr|cmd=<code class="command">taktuk</code> -l root -s -o connector -o status -o output='"$host: $line\n"' -f $OAR_FILE_NODES broadcast exec ["apt-get update"]}}
  0.222663879394531 s de connection de azur-1.sophia.grid5000.fr a sol-24.sophia.grid5000.fr (dont 0.0479059219360352 du a TakTuk)
0.225576162338257 s de connection de azur-30.sophia.grid5000.fr a sol-27.sophia.grid5000.fr (dont 0.0478250980377197 du a TakTuk)
0.226027965545654 s de connection de azur-16.sophia.grid5000.fr a sol-31.sophia.grid5000.fr (dont 0.0470089912414551 du a TakTuk)
...

Latest revision as of 15:26, 6 April 2022

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

During the course of an experiment on G5k, one often wants to run programs or scripts on all the reserved nodes simultaneously. For instance, to send data to remote workers. This could be achieved sequentially with a loop construct in a script, but this method does not scale well. That's why a launcher for parallel commands named TakTuk has been installed on each G5k site.

What is TakTuk

TakTuk is a parallel launcher.

To speed up the remote execution of the same command on a large number of nodes, TakTuk both

  • uses a deployment tree to initiate several connections to remote peers at the same time
  • initiates locally a few connections in parallel (the deployment window)

The balance between the two mechanisms is ensured by an adaptive work-stealing algorithm combined with an on-line evaluation of the local machine load (see A guide to TakTuk use by Guillaume Huard for more details).

Taktuk schema.png

Therefore, for TakTuk to operate correctly, peers should be able to connect to each other. By default, TakTuk uses ssh connections. So, to check if TakTuk execution can work, please follow instructions on this page: Test key authentication. There is also a practical example for setting up TakTuk in the Advanced OAR tutorial.

External Links

Using TakTuk in Grid'5000

Using TakTuk with Grid'5000's default environment

Prerequisites

You own a reservation for some machines:

Terminal.png users@frontale.site.grid5000.fr:
oarsub -I -l nodes=4

You are logged on one of the reserved nodes:

Terminal.png users@node-15.site.grid5000.fr:

Connections can be made without password from this node and between nodes using your account.

Using Taktuk to launch scripts

To execute scripts simultaneously on all the nodes:

Terminal.png users@node-15.site.grid5000.fr:
taktuk -c "oarsh" -f $OAR_FILE_NODES broadcast exec [ /home/site/user/testScript.sh ]

Note: do not forget to type the characters [ and ] and separate them from their content (using spaces for instance).

Note: taktuk uses classic ssh by default. If your job did not reserve all the CPU cores of nodes, you'll need to add -c "oarsh" to tatktuk command line.

By default, TakTuk displays a lot of informations and part of it might not be necessary. Basically, you can choose exactly what is displayed by TakTuk (type of information and format of the displayed messages) using the output-template option. For instance, the following options suppress the connector and status streams and format the output of commands to use the host name followed by the content of the output:

-o connector -o status -o output='"$host: $line\n"'
Example

To check if time clocks are synchronized on all the nodes (Result formated correctly)

Terminal.png users@node-15.site.grid5000.fr:
taktuk -c "oarsh" -o connector -o status -o output='"$host: $line\n"' -f <( uniq $OAR_FILE_NODES ) broadcast exec [ date ]
Output
node-101.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
node-23.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
node-24.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
node-67.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
node-97.bordeaux.grid5000.fr: Fri Mar 23 09:38:46 CET 2007
To find CPUs installed on nodes
Terminal.png user@node-15.site.grid5000.fr:
taktuk -c "oarsh" -o connector -o status -o output='"$host: $line\n"' -f <( uniq $OAR_FILE_NODES ) broadcast exec [ "cat /proc/cpuinfo | grep 'model name'" ] | uniq
Output
node-101.bordeaux.grid5000.fr: model name       :                   Intel(R) Xeon(TM) CPU 3.00GHz
node-23.bordeaux.grid5000.fr: model name        : AMD Opteron(tm) Processor 248
node-24.bordeaux.grid5000.fr: model name        : AMD Opteron(tm) Processor 248
node-67.bordeaux.grid5000.fr: model name        :                   Intel(R) Xeon(TM) CPU 3.00GHz
node-97.bordeaux.grid5000.fr: model name        :                   Intel(R) Xeon(TM) CPU 3.00GHz

Note: instead of using the option -c "oarsh" in each command, we could type the following command :

Terminal.png users@node-15.site.grid5000.fr:
export TAKTUK_CONNECTOR=oarsh

and then taktuk would know that it needs to use oarsh as a connector.

Using TakTuk with your own environment but without TakTuk installation

Prerequisites

Start by making a reservation:

Terminal.png users@frontale.site.grid5000.fr:
oarsub -I -l nodes=4 -t deploy

Deploy your environment:

Terminal.png users@frontale.site.grid5000.fr:
kadeploy3 -f $OAR_FILE_NODES -e debian11-x64-base -k

Using Taktuk to launch programs

Using the -s option, you can use taktuk to deploy on remote hosts that don't have taktuk installed.

Terminal.png users@frontale.site.grid5000.fr:
taktuk -s -o connector -o status -o output='"$host: $line\n"' -f <( uniq $OAR_FILE_NODES ) broadcast exec [ "apt-get update" ]

To execute an apt-get as root for system update, you have to use the following option: "-l root" before any machine specification.

Terminal.png users@frontale.site.grid5000.fr:
taktuk -l root -s -o connector -o status -o output='"$host: $line\n"' -f <( uniq $OAR_FILE_NODES ) broadcast exec [ "apt-get update" ]

Some extra examples

  • To execute a simple command like date on 1 node:
taktuk -m node-5.site.grid5000.fr broadcast exec [ date ]
  • Same execution but on more than 1 node:
taktuk -m node-5.site.grid5000.fr -m node-10.site.grid5000.fr broadcast exec [ date ]
  • Execute command on nodes given by OAR after reservation
taktuk -f <( uniq $OAR_FILE_NODES ) broadcast exec [ date ]
  • Self propagation of TakTuk if it is not available everywhere (works only if you can login from nodes to nodes without typing your password):
taktuk -s -f <( uniq $OAR_FILE_NODES ) broadcast exec [ date ]
  • Connect to nodes with a different login, root for instance:
taktuk -l root -f <( uniq $OAR_FILE_NODES ) broadcast exec [ date ]
  • In order to test latencies between nodes using taktuk:

As a user from the frontend, run:

taktuk -c "ssh -p 6667" -s -o 'state=(($line == 4)?($init_date - $start_date)." s de connection de $host a $peer (dont ".($init_date-$reply_date)." du a TakTuk)\n":"")' -f allnodes.txt b e [ true ] | tee taktuk.log

where taktuk version is at least 3.3 and allnodes.txt contains the list of the nodes.

Then doing a grep connection taktuk.log | sort -n gives:

0.220579862594604 s de connection de azur-25.sophia.grid5000.fr a sol-12.sophia.grid5000.fr (dont 0.047921895980835 du a TakTuk)
0.220623016357422 s de connection de azur-17.sophia.grid5000.fr a sol-36.sophia.grid5000.fr (dont 0.0476739406585693 du a TakTuk)
0.222663879394531 s de connection de azur-1.sophia.grid5000.fr a sol-24.sophia.grid5000.fr (dont 0.0479059219360352 du a TakTuk)
0.225576162338257 s de connection de azur-30.sophia.grid5000.fr a sol-27.sophia.grid5000.fr (dont 0.0478250980377197 du a TakTuk)
0.226027965545654 s de connection de azur-16.sophia.grid5000.fr a sol-31.sophia.grid5000.fr (dont 0.0470089912414551 du a TakTuk)
...