Introduction

The Nancy and Rennes Grid'5000 sites also hosts clusters for production use (including clusters with GPUs). See Nancy:Hardware and Rennes:Hardware for details.

The usage rules differ from the rest of Grid'5000:

Advance reservations (oarsub -r) are not allowed (to avoid fragmentation). Only submissions (and reservations that start immediately) are allowed.
All Grid'5000 users can use those nodes (provided they meet the conditions stated in Grid5000:UsagePolicy), but it is expected that users outside of LORIA / Centre Inria Nancy -- Grand Est and IRISA / Centre Inria de l'Université de Rennes will use their own local production resources in priority, and mostly use those resources for tasks that require Grid'5000 features. Examples of local production clusters are Cleps (Paris), Margaret (Saclay), Plafrim (Bordeaux), etc.

Using the resources

Getting an account

Users from the Loria laboratory (LORIA/Centre Inria Nancy Grand-Est) and the Irisa (IRISA/Centre Inria de l'Université de Rennes) that want to access Grid'5000 primarily for a production usage must use that request form to open an account, like regular Grid'5000 users.

The following fields must be filled as follows:
- Group Granting Access (GGA): either the group named after the research team, or if it does not belong to the team list below: loria (for Nancy) or igrida (for Rennes).
- Laboratory: LORIA or IRISA
- Team: INTUIDOC, SYNALP, LACODAM, MULTISPEECH, SERPICO, CARAMBA, CAPSID, SIROCCO, ORPAILLEUR, LARSEN, CIDRE, SEMAGRAMME, LINKMEDIA, SISR, TANGRAM...

Other users from Nancy (not belonging to the Loria laboratory) can ask to join using the nancy-misc Group Granting Access while other users from Rennes (not belonging to the Irisa laboratory) can ask to join using the rennes-misc Group Granting Access.

Users are automatically subscribed to the Grid'5000 users mailing lists: users@lists.grid5000.fr. This list is the user-to-user or user-to-admin communication mean to address help/support requests for Grid'5000. The technical team can be reached on support-staff@lists.grid5000.fr.

Learning to use Moyens de Calcul hosted by Grid'5000

Refer to the Production:Getting Started Production tutorial (derived from Getting Started Grid'5000 tutorial. There are other tutorial listed on the Users Home page.

Using deep learning software on Grid'5000

A tutorial for using deep learning software on Grid'5000, written by Ismael Bada is also available.

Using production resources

To access production resources, you need to submit jobs to the production queue using the command -q production. Job submissions in the production queue are prioritized based on who funded the material. There are four levels of priority, each with a maximum job duration:

p1 -- 168h (one week)
p2 -- 96h (four days)
p3 -- 48h (two days)
p4 -- 24h (one day)
You may also have access to the clusters on besteffort.

Note

Moreover, with p1 priority, user can submit advanced reservation. More information about that in the Advanced OAR Page. For example, to reserve one week from now:

fnancy:

oarsub -q p1 -r "$(date +'%F %T' --date='+1 week')"

p1 priority level also allow to extend the duration of a job. The extension is only apply 24h before the end of the job and cannot be longer than 168h. More information about this feature can be found also on the Advance Oar Page.

	Warning
	These limits DO NOT replace the maximum walltime per node which are still in effects.

You can check your priority level for any cluster using https://api.grid5000.fr/explorer.

	Note
	As of today, the resources explorer only shows basic information. Additional information will be added in the near future.

When submitting a job, by default, you will be placed at the highest priority level that allows you to maximize resources:

fnancy:

oarsub -q production -I

Using the command above will generally place your job at the lowest priority to allow usage of all clusters, even those where your priority is p4.

When you specify a cluster, your job will be set to your highest priority level for that cluster:

fnancy:

oarsub -q production -p grele -I

You can also limit a job submission to a cluster at a specific priority level using -qPRIORITY LEVEL:

fnancy:

oarsub -q p2 -l nodes=2,walltime=90 './yourScript.py'

Dashboards and status pages

planned and ongoing maintenances, events and issues on Grid'5000

Nancy

Rennes

Contact information and support

For support, see the Support page.

Contacts:

The Grid'5000 responsable de site for Nancy is "Thomas Lambert" (thomas.lambert@inria.fr) and for Rennes is "Anne Cécile Orgerie" (anne-cecile.orgerie@irisa.fr)
Local mailing lists: all Grid'5000 users from Nancy and Rennes are automatically subscribed to nancy-users@lists.grid5000.fr or rennes-users@lists.grid5000.fr, respectively.

FAQ

Data storage

Research teams, people of different teams, individuals can ask for different Group storages in order to manage their data at the team level. The main benefit of using Group storages is that they allow for the members of the group to share their data (corpus, datasets, results ...) and to overcome easily the quota restrictions of the home directories.

Please remember that NFS servers (the home directories are also served by a NFS server) are quite slow when it comes to process a huge amount of small files during a computation, and if your are in this case, you may consider to do the major part of your I/Os on the nodes and copy back the results on the NFS server at the end of the experiment.

See here for other kind of storage available on the platform.

Nancy

Group storages are used to control the access to different storage spaces located on the storage[1-5].nancy.grid5000.fr NFS servers (more information about the maximum capacities of each of these server can be found here). Ask to your GGA leader if your team have access to one or more storage spaces (this is the case for instance for the following teams: Bird, Capsid, Caramba, Heap, Multispeech, Optimist, Orpailleur, Semagramme, Sisr, Synalp, Tangram).

Rennes

Group storages are used to control the access to different storage spaces located on the storage2.rennes.grid5000.fr NFS server (more information about the maximum capacities of these server can be found here). Ask to your GGA leader if your team have access to one or more storage spaces (this is the case for instance for the following teams: cidre and sirocco (compactdisk storage)).

I am physically located in the LORIA/IRISA building, is there a shorter path to connect?

Where your are located in LORIA/IRISA building, you can benefit from a direct connection that does not go through Grid'5000 national access machines (access-south and access-north). To do so, use access.nancy or access.rennes (instead of access).

mylaptop:

ssh jdoe@access.nancy.grid5000.fr

mylaptop:

ssh jdoe@access.rennes.grid5000.fr

Configure an SSH alias for the local access

To establish a connection to the Grid'5000 network from the local access, you can configure your SSH client as follows:

laptop:

editor ~/.ssh/config

Host g5kl
  User login
  Hostname access.site.grid5000.fr
  ForwardAgent no

Host *.g5kl
  User login
  ProxyCommand ssh g5k -W "$(basename %h .g5kl):%p"
  ForwardAgent no

Reminder: login is your Grid'5000 username and site is either nancy or rennes.

With such a configuration, you can:

connect the frontend related to your local site

laptop:

ssh g5kl

transfer files from your laptop to your local frontend (with better bandwidth than using the national Grid'5000 access)

laptop:

scp myFile g5kl:~/

access the frontend of a different site:

laptop:

ssh grenoble.g5kl

transfer files from your laptop to your a different frontend

laptop:

scp myFile sophia.g5kl:~/

How to access data in hosted on Inria/Loria or Inria/Irisa serveurs

Grid'5000 network is not directly connected to Inria/Loria or Inria/Irisa internal servers. If you want to access from the Grid'5000 frontend and/or the Grid'5000 nodes, you need to use a local Bastion host. If you need to regularly transfer data, it is highly recommanded to configure the SSH client on each Grid'5000 frontends.

	Note
	Please note that you have a different home directory on each Grid'5000 site, so you may need to replicate your SSH configuration across multiple sites.

Nancy

bastionssh.loria.fr is an access machine hosted on Loria side. That machine can be used to access all services in the Inria/Loria environment.

frontend:

editor ~/.ssh/config

Host accessloria
   Hostname bastionssh.loria.fr
   User <code class=replace>jdoe</code> # to be replaced by your LORIA login

Host *.loria
   ProxyCommand ssh accessloria -W $(basename %h .loria):%p
   User <code class=replace>jdoe</code> # to be replaced by your LORIA login

	Note
	Given that `bastionssh.loria.fr` only accepts logins using SSH key, you cannot simply connect with your LORIA password.

Rennes

transit.irisa.fr is an access machine hosted on Irisa side. That machine can be used to access all services in the Inria/Irisa environment.

frontend:

editor ~/.ssh/config

Host transit
   Hostname transit.irisa.fr
   User <code class=replace>jdoe</code> # to be replaced by your IRISA login

Data hosted on Inria's NAS server is accessible on /nfs of transit.irisa.fr. Considering that you have set the configuration on Grenoble homedir:

fgrenoble:

scp transit:/nfs/nas4.irisa.fr/repository ~/local_dir

Transfer files to Grid'5000 storage

With that setup, you can now use :

Rsync to synchronize your data on Inria/Loria environment and data on your local home on Grid'5000 frontend
Sshfs to mount directly your data directory on Inria/Loria environment under your local home. <=> mount your /user/my_team/my_username (origin = bastionssh.loria.fr) on fnancy (destination = a folder on fnancy).

eg:

fnancy:

sshfs -o idmap=user jdoe@tregastel.loria:/users/myteam/jdoe ~/local_dir

To unmount the remote filesystem:

fnancy:

fusermount -u ~/local_dir

I submitted a job, there are free resources, but my job doesn't start as expected!

Most likely, this is because of our configuration of resources restriction per walltime. In order to make sure that someone requesting only a few nodes, for a small amount of time will be able to get soon enough, the nodes are split into categories. This depends on each cluster and is visible in the Gantt chart. An example of split is:

20% of the nodes only accept jobs with walltime lower than 1h
20% -- 2h
20% -- 24h (1 day)
20% -- 48h (2 days)
20% -- 168h (one week)

Note that best-effort jobs are excluded from those limitations.

To see the exact walltime partition of each production cluster, have a look at the Nancy Hardware page or Rennes Hardware page.

Another OAR feature that could impact the scheduling of your jobs is the OAR scheduling with fair-sharing, which is based on the notion of karma: this feature assigns a dynamic priority to submissions based on the history of submissions by a specific user. With that feature, the jobs from users that rarely submit jobs will be generally scheduled earlier than jobs from heavy users.

I have an important demo, can I reserve all resources in advance?

There's a special challenge queue that can be used to combine resources from the classic Grid'5000 clusters and the production clusters for special events. If you would like to use it, please ask for a special permission from the executive committee.

Can I use besteffort jobs in production ?

Yes, you can submit a besteffort job on the production resources by using OAR -t besteffort option. Here is an exemple:

fnancy:

oarsub -t besteffort -q production./my_script.sh

If you didn't specify the -q production option, your job could run on both production and non-production resources.

How to cite / Comment citer

If you use the Grid'5000 production clusters for your research and publish your work, please add this sentence in the acknowledgements section of your paper:

Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).

Production

Contents

Introduction

Using the resources

Getting an account

Learning to use Moyens de Calcul hosted by Grid'5000

Using deep learning software on Grid'5000

Using production resources

Dashboards and status pages

Nancy

Rennes

Contact information and support

FAQ

Data storage

Nancy

Rennes

I am physically located in the LORIA/IRISA building, is there a shorter path to connect?

Configure an SSH alias for the local access

How to access data in hosted on Inria/Loria or Inria/Irisa serveurs

Nancy

Rennes

Transfer files to Grid'5000 storage

I submitted a job, there are free resources, but my job doesn't start as expected!

I have an important demo, can I reserve all resources in advance?

Can I use besteffort jobs in production ?

How to cite / Comment citer

Navigation menu

Production

Introduction

Using the resources

Getting an account

Learning to use Moyens de Calcul hosted by Grid'5000

Using deep learning software on Grid'5000

Using production resources

Dashboards and status pages

Nancy

Rennes

Contact information and support

FAQ

Data storage

Nancy

Rennes

I am physically located in the LORIA/IRISA building, is there a shorter path to connect?

Configure an SSH alias for the local access

How to access data in hosted on Inria/Loria or Inria/Irisa serveurs

Nancy

Rennes

Transfer files to Grid'5000 storage

I submitted a job, there are free resources, but my job doesn't start as expected!

I have an important demo, can I reserve all resources in advance?

Can I use besteffort jobs in production ?

How to cite / Comment citer

Navigation menu

Search