Production:FAQ

From Grid5000
Revision as of 18:17, 20 January 2025 by Lmirtain (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

About this document

How to add/correct an entry to the FAQ?

Note.png Note

Just like any other page of this wiki, you can edit the FAQ yourself to improve it. If you click on one of the little "edit" placed after each question, you'll get the possibility to edit that particular question. To edit the whole page, simply choose the edit tab at the top of the page.


What's the difference between this FAQ and Grid'5000 FAQ?

Note.png Note

This FAQ is dedicated to HPC and AI production uses of the platform, while the Grid'5000 FAQ deals with uses in the field of network experimentation and distributed computing.

What data spaces are available and how to access them?

Research teams, people of different teams, individuals can ask for different Group storages in order to manage their data at the team level. The main benefit of using Group storages is that they allow for the members of the group to share their data (corpus, datasets, results ...) and to overcome easily the quota restrictions of the home directories.

Please remember that NFS servers (the home directories are also served by a NFS server) are quite slow when it comes to process a huge amount of small files during a computation, and if your are in this case, you may consider to do the major part of your I/Os on the nodes and copy back the results on the NFS server at the end of the experiment.

See here for other kind of storage available on the platform.

Grenoble

FIXME à compléter

Nancy

Group storages are used to control the access to different storage spaces located on the storage[1-5].nancy.grid5000.fr NFS servers (more information about the maximum capacities of each of these server can be found here). Ask to your GGA leader if your team have access to one or more storage spaces (this is the case for instance for the following teams: Bird, Capsid, Caramba, Heap, Multispeech, Optimist, Orpailleur, Semagramme, Sisr, Synalp, Tangram).

Rennes

Group storages are used to control the access to different storage spaces located on the storage2.rennes.grid5000.fr NFS server (more information about the maximum capacities of these server can be found here). Ask to your GGA leader if your team have access to one or more storage spaces (this is the case for instance for the following teams: cidre and sirocco (compactdisk storage)).

Sophia

FIXME à compléter

I am physically located in the LORIA/IRISA building, is there a shorter path to connect?

Where your are located in LORIA/IRISA building, you can benefit from a direct connection that does not go through Grid'5000 national access machines (access-south and access-north). To do so, use access.nancy or access.rennes (instead of access).

Terminal.png mylaptop:
ssh jdoe@access.nancy.grid5000.fr
Terminal.png mylaptop:
ssh jdoe@access.rennes.grid5000.fr

Configure an SSH alias for the local access

To establish a connection to the Grid'5000 network from the local access, you can configure your SSH client as follows:

Terminal.png laptop:
editor ~/.ssh/config
Host g5kl
  User login
  Hostname access.site.grid5000.fr
  ForwardAgent no

Host *.g5kl
  User login
  ProxyCommand ssh g5k -W "$(basename %h .g5kl):%p"
  ForwardAgent no

Reminder: login is your Grid'5000 username and site is either nancy or rennes.

With such a configuration, you can:

  • connect the frontend related to your local site
Terminal.png laptop:
ssh g5kl
  • transfer files from your laptop to your local frontend (with better bandwidth than using the national Grid'5000 access)
Terminal.png laptop:
scp myFile g5kl:~/
  • access the frontend of a different site:
Terminal.png laptop:
ssh grenoble.g5kl
  • transfer files from your laptop to your a different frontend
Terminal.png laptop:
scp myFile sophia.g5kl:~/

How to access data in hosted on Inria/Loria or Inria/Irisa serveurs

Grid'5000 network is not directly connected to Inria/Loria or Inria/Irisa internal servers. If you want to access from the Grid'5000 frontend and/or the Grid'5000 nodes, you need to use a local Bastion host. If you need to regularly transfer data, it is highly recommanded to configure the SSH client on each Grid'5000 frontends.

Note.png Note

Please note that you have a different home directory on each Grid'5000 site, so you may need to replicate your SSH configuration across multiple sites.

Nancy

ssh-nge.loria.fr is an access machine hosted on Loria side. That machine can be used to access all services in the Inria/Loria environment.

Terminal.png frontend:
editor ~/.ssh/config
Host accessloria
   Hostname ssh-nge.loria.fr
   User <code class=replace>jdoe</code> # to be replaced by your LORIA login

Host *.loria
   ProxyCommand ssh accessloria -W $(basename %h .loria):%p
   User <code class=replace>jdoe</code> # to be replaced by your LORIA login
Note.png Note

Given that ssh-nge.loria.fr only accepts logins using SSH key, you cannot simply connect with your LORIA password.

Rennes

ssh-rba.inria.fr is an access machine hosted on Irisa side. That machine can be used to access all services in the Inria/Irisa environment.

Terminal.png frontend:
editor ~/.ssh/config
Host ssh-rba
   Hostname ssh-rba.inria.fr
   User <code class=replace>jdoe</code> # to be replaced by your IRISA login

Data hosted on Inria's NAS server is accessible on /nfs of ssh-rba.inria.fr. Considering that you have set the configuration on Grenoble homedir:

Terminal.png fgrenoble:
scp ssh-rba:/nfs/nas4.irisa.fr/repository ~/local_dir

Transfer files to Grid'5000 storage

With that setup, you can now use :

  • Rsync to synchronize your data on Inria/Loria environment and data on your local home on Grid'5000 frontend
  • Sshfs to mount directly your data directory on Inria/Loria environment under your local home. <=> mount your /user/my_team/my_username (origin = ssh-nge.loria.fr) on fnancy (destination = a folder on fnancy).

eg:

Terminal.png fnancy:
sshfs -o idmap=user jdoe@tregastel.loria:/users/myteam/jdoe ~/local_dir

To unmount the remote filesystem:

Terminal.png fnancy:
fusermount -u ~/local_dir

I submitted a job, there are free resources, but my job doesn't start as expected!

Most likely, this is because of our configuration of resources restriction per walltime. In order to make sure that someone requesting only a few nodes, for a small amount of time will be able to get soon enough, the nodes are split into categories. This depends on each cluster and is visible in the Gantt chart. An example of split is:

  • 20% of the nodes only accept jobs with walltime lower than 1h
  • 20% -- 2h
  • 20% -- 24h (1 day)
  • 20% -- 48h (2 days)
  • 20% -- 168h (one week)

Note that best-effort jobs are excluded from those limitations.

To see the exact walltime partition of each production cluster, have a look at the Nancy Hardware page or Rennes Hardware page.

Another OAR feature that could impact the scheduling of your jobs is the OAR scheduling with fair-sharing, which is based on the notion of karma: this feature assigns a dynamic priority to submissions based on the history of submissions by a specific user. With that feature, the jobs from users that rarely submit jobs will be generally scheduled earlier than jobs from heavy users.

I have an important demo, can I reserve all resources in advance?

There's a special challenge queue that can be used to combine resources from the classic Grid'5000 clusters and the production clusters for special events. If you would like to use it, please ask for a special permission from the executive committee.

Can I use besteffort jobs in production ?

Yes, you can submit a besteffort job on the production resources by using OAR -t besteffort option. Here is an exemple:

Terminal.png fnancy:
oarsub -t besteffort -q production./my_script.sh

If you didn't specify the -q production option, your job could run on both production and non-production resources.

How to cite / Comment citer

If you use the Grid'5000 production clusters for your research and publish your work, please add this sentence in the acknowledgements section of your paper:

Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).