Production:FAQ
About this document
How to add/correct an entry to the FAQ?
What's the difference between this FAQ and Grid'5000 FAQ?
Note | |
---|---|
This FAQ is dedicated to HPC and AI production uses of the platform, while the Grid'5000 FAQ deals with uses in the field of network experimentation and distributed computing. |
What data spaces are available and how to access them?
Research teams, people of different teams, individuals can ask for different Group storages in order to manage their data at the team level. The main benefit of using Group storages is that they allow for the members of the group to share their data (corpus, datasets, results ...) and to overcome easily the quota restrictions of the home directories.
Please remember that NFS servers (the home directories are also served by a NFS server) are quite slow when it comes to process a huge amount of small files during a computation, and if your are in this case, you may consider to do the major part of your I/Os on the nodes and copy back the results on the NFS server at the end of the experiment.
See here for other kind of storage available on the platform.
Grenoble
FIXME à compléter
Nancy
Group storages are used to control the access to different storage spaces located on the storage[1-5].nancy.grid5000.fr NFS servers (more information about the maximum capacities of each of these server can be found here). Ask to your GGA leader if your team have access to one or more storage spaces (this is the case for instance for the following teams: Bird, Capsid, Caramba, Heap, Multispeech, Optimist, Orpailleur, Semagramme, Sisr, Synalp, Tangram).
Rennes
Group storages are used to control the access to different storage spaces located on the storage2.rennes.grid5000.fr NFS server (more information about the maximum capacities of these server can be found here). Ask to your GGA leader if your team have access to one or more storage spaces (this is the case for instance for the following teams: cidre and sirocco (compactdisk storage)).
Sophia
FIXME à compléter
I am physically located in the LORIA/IRISA building, is there a shorter path to connect?
Where your are located in LORIA/IRISA building, you can benefit from a direct connection that does not go through Grid'5000 national access machines (access-south and access-north). To do so, use access.nancy
or access.rennes
(instead of access
).
Configure an SSH alias for the local access
To establish a connection to the Grid'5000 network from the local access, you can configure your SSH client as follows:
Hostg5kl
Userlogin
Hostname access.site
.grid5000.fr ForwardAgent no Host*.g5kl
Userlogin
ProxyCommand ssh g5k -W "$(basename %h .g5kl):%p" ForwardAgent no
Reminder: login
is your Grid'5000 username and site
is either nancy or rennes.
With such a configuration, you can:
- connect the frontend related to your local site
- transfer files from your laptop to your local frontend (with better bandwidth than using the national Grid'5000 access)
- access the frontend of a different site:
- transfer files from your laptop to your a different frontend
How to access data in hosted on Inria/Loria or Inria/Irisa serveurs
Grid'5000 network is not directly connected to Inria/Loria or Inria/Irisa internal servers. If you want to access from the Grid'5000 frontend and/or the Grid'5000 nodes, you need to use a local Bastion host. If you need to regularly transfer data, it is highly recommanded to configure the SSH client on each Grid'5000 frontends.
Note | |
---|---|
Please note that you have a different home directory on each Grid'5000 site, so you may need to replicate your SSH configuration across multiple sites. |
Nancy
ssh-nge.loria.fr
is an access machine hosted on Loria side.
That machine can be used to access all services in the Inria/Loria environment.
Host accessloria Hostname ssh-nge.loria.fr User <code class=replace>jdoe</code> # to be replaced by your LORIA login Host *.loria ProxyCommand ssh accessloria -W $(basename %h .loria):%p User <code class=replace>jdoe</code> # to be replaced by your LORIA login
Note | |
---|---|
Given that |
Rennes
ssh-rba.inria.fr
is an access machine hosted on Irisa side. That machine can be used to access all services in the Inria/Irisa environment.
Host ssh-rba Hostname ssh-rba.inria.fr User <code class=replace>jdoe</code> # to be replaced by your IRISA login
Data hosted on Inria's NAS server is accessible on /nfs of ssh-rba.inria.fr
. Considering that you have set the configuration on Grenoble homedir:
Transfer files to Grid'5000 storage
With that setup, you can now use :
- Rsync to synchronize your data on Inria/Loria environment and data on your local home on Grid'5000 frontend
- Sshfs to mount directly your data directory on Inria/Loria environment under your local home. <=> mount your /user/my_team/my_username (origin = ssh-nge.loria.fr) on fnancy (destination = a folder on fnancy).
eg:
To unmount the remote filesystem:
I submitted a job, there are free resources, but my job doesn't start as expected!
Most likely, this is because of our configuration of resources restriction per walltime. In order to make sure that someone requesting only a few nodes, for a small amount of time will be able to get soon enough, the nodes are split into categories. This depends on each cluster and is visible in the Gantt chart. An example of split is:
- 20% of the nodes only accept jobs with walltime lower than 1h
- 20% -- 2h
- 20% -- 24h (1 day)
- 20% -- 48h (2 days)
- 20% -- 168h (one week)
Note that best-effort jobs are excluded from those limitations.
To see the exact walltime partition of each production cluster, have a look at the Nancy Hardware page or Rennes Hardware page.
Another OAR feature that could impact the scheduling of your jobs is the OAR scheduling with fair-sharing, which is based on the notion of karma: this feature assigns a dynamic priority to submissions based on the history of submissions by a specific user. With that feature, the jobs from users that rarely submit jobs will be generally scheduled earlier than jobs from heavy users.
I have an important demo, can I reserve all resources in advance?
There's a special challenge queue that can be used to combine resources from the classic Grid'5000 clusters and the production clusters for special events. If you would like to use it, please ask for a special permission from the executive committee.
Can I use besteffort jobs in production ?
Yes, you can submit a besteffort job on the production resources by using OAR -t besteffort
option. Here is an exemple:
If you didn't specify the -q production
option, your job could run on both production and non-production resources.
How to cite / Comment citer
If you use the Grid'5000 production clusters for your research and publish your work, please add this sentence in the acknowledgements section of your paper:
Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).