Revision as of 17:30, 12 November 2015

Introduction

The Nancy Grid'5000 site also hosts nodes for production use. Those nodes are:

talc, a 144-node Intel Xeon cluster, which achieves a 7360 GFlops performance peak
mbi, a 32-node Intel Xeon cluster, with 7 GPU (mbi-[19-26]) (Tesla Nvidia)
graphique, a 6-node Intel Xeon cluster, with 2 GPU per node (GTX 980 Nvidia)

The usage rules differ from the rest of Grid'5000:

Advance reservations (oarsub -r) are not allowed (to avoid fragmentation). Only submissions (and reservations that start immediately) are allowed.
All Grid'5000 users can use those nodes, but it is expected that users outside of LORIA / Inria Nancy -- Grand Est will use their own local production resources in priority, and mostly use those resources for tasks that require Grid'5000 features. Examples of local production clusters are Tompouce (Saclay), Igrida (Rennes), Plafrim (Bordeaux), etc.

Using the resources

Getting an account

Please use the request form here.

The following fields must be filled as indicated (but the other fields must be filled too):
- manager: lnussbaum
- site: nancy
- groups, roles: none
- privileges: user
You are automatically subscribed to the Grid'5000 users' mailing lists: users@lists.grid5000.fr

This list is the user-to-user or user-to-admin communication mean to address help/support requests for Grid'5000.

Learning to use Grid'5000

Refer to the Getting Started tutorial. There are other tutorial listed on the Users Home page.

Using production resources

To access production resources, you need to submit jobs in the production queue:

oarsub -q production -I
oarsub -q production -p "cluster='talc'" -I
oarsub -q production -l nodes=2,walltime=240 -I
oarsub -q production -l walltime=24 -t deploy 'sleep 100d'
...

Dashboards and status pages

Contact information and support

Contacts:

The local system administrator is Clément Parisot (clement.parisot@inria.fr)
The Grid'5000 responsable de site for Nancy is Lucas Nussbaum (lucas.nussbaum@loria.fr)
The rest of the Grid'5000 team can be contacted as described on the Support page.

To get support, you can:

Use the users@lists.grid5000.fr mailing list: all Grid'5000 users (700+ people) are automatically subscribed
Use the nancy-users@lists.grid5000.fr mailing list: all Grid'5000 users from Nancy are automatically subscribed

The Grid'5000 team does not have the resources (manpower) to do user support, such as helping with writing scripts, creating system images, etc. If you need such help, please contact the SED service.

FAQ

Data storage

	Warning
	There is currently some problems with the storage on Grid'5000 production. The following informations will be updated frequently

Data we saved by successives reboot using rsync can be find on nancy frontend (fnancy) at /data/ftalc2_backup. It is mounted read-only.

For users that asked for data save, please check that what you asked was saved correctly.

If anyone forgot to signal very important data on /data1 or /home, let us know as soon as possible!

As we still don't know how much time it will take to finish backup and to bring nfs back, talc nodes have been integrated at Grid'5000 and can be used from fnancy using production queue.

I am physically located in the LORIA building, is there a shorter path to connect?

If for some reason you don't want to go through Grid'5000 national access machines (access-south and access-north), you can also connect directly using

mylaptop:

ssh jdoe@access.nancy.grid5000.fr

I have a large amount of jobs to execute, is there a better solution ?

Yes. You should have a look at CiGri. This middleware makes it easier to submit large number of jobs as best-effort' jobs on Grid'5000. The advantage is that you would benefit from the computing power of the whole of Grid'5000 for your jobs, not just of Nancy's production resources.

How to access data in Inria/Loria

	Warning
	There is currently a bug that block this feature.

bastionssh.loria.fr is an access machine hosted on Loria side. That machine can be used to access all services in the Inria/Loria environment.

You need to use SSH ProxyCommand for that purpose.

Adapt following lines for your ~/.ssh/config

  Host accessloria
        HostName bastionssh.loria.fr
        User jdoe
        Port 22
        IdentityFile /home/%u/.ssh/id_rsa
        ForwardAgent no
  Host *.loria
        User jdoe
        ProxyCommand ssh accessloria -W $(basename %h .loria):%p
        IdentityFile /home/%u/.ssh/id_rsa

With that setup, you can now use :

Rsync to synchronize your data on Inria/Loria environment and data on your local home on Grid'5000 frontend
Sshfs to mount directly your data directory on Inria/Loria environment under your local home. <=> mount your /user/my_team/my_username (origin = bastionssh.loria.fr) on fnancy (destination = a folder on fnancy).

eg:

fnancy:

sshfs -o idmap=user jdoe@tregastel.loria:/users/myteam/jdoe ~/remote_dir

To unmount the remote filesystem:

fnancy:

fusermount -u ~/remote_dir

	Note
	The password for your Inria/Loria account will be asked twice if you don't specify a ssh key.

I submitted a job, there are free resources, but my job doesn't start as expected!

Most likely, this is because of our configuration of resources restriction per walltime. In order to make sure that someone requesting only a few nodes, for a small amount of time will be able to get soon enough, the nodes are split into categories:

20% of the nodes only accept jobs with walltime lower than 1h
20% -- 2h
20% -- 24h
20% -- 48h
20% accept all jobs (no limit on duration)

Note that best-effort jobs are excluded from those limitations.

Another enabled OAR feature that could impact the scheduling of your jobs is the OAR karma: this feature assigns a dynamic priority to submissions based on the history of submissions by a specific user. With that feature, the jobs from users that rarely submit jobs will be generally scheduled earlier than jobs from heavy users.

I have an important demo, can I reserve all resources in advance?

There's a special challenge queue that can be used to combine resources from the classic Grid'5000 clusters and the production clusters for special events. If you would like to use it, please get in touch with the clusters managers.

Energy costs

Grid'5000 nodes are automatically shut down when they are not reserved so, when possible, it is a good idea to reserve nodes during cheaper time slots.

Electricity costs are currently:

Périodes:
- Heures pointe: Décembre, Janvier, Février; 09H00-11H00/18H00-20H00
- Heures Pleines Hiver: 06H00-22H00 (hors heures de pointe précisées à l’article 18).
- Heures Creuses Hiver: 22H00-06H00
- Heures Pleines Eté: 06H00-22H00
- Heures Creuses Eté: 22H00-06H00
- Le dimanche ne comprend que des heures creuses en hiver et été.
Cout du KWh
- Heure pointe 10,893c€/KWh
- Heure Pleine Hiver 6,535c€/KWh
- Heure Creuse Hiver 4,474c€/KWh
- Heure Pleine Eté 4,125c€/KWh
- Heure Creuse Eté 2,580c€/KWh

@@ Line 55: / Line 55: @@
 = FAQ =
 == Data storage ==
-{{Act|Clément|compléter avec les instructions pour accéder aux baies depuis les noeuds (où sont-elles montées)}}
+{{Warning|text=There is currently some problems with the storage on Grid'5000 production. The following informations will be updated frequently}}
+Data we saved by successives reboot using rsync can be find on nancy frontend (fnancy) at /data/ftalc2_backup. It is mounted read-only.
+For users that asked for data save, please check that what you asked was saved correctly.
+If anyone forgot to signal very important data on /data1 or /home, let us know as soon as possible!
+As we still don't know how much time it will take to finish backup and to bring nfs back, talc nodes have been integrated at Grid'5000 and can be used from fnancy using production queue.
 == I am physically located in the LORIA building, is there a shorter path to connect? ==

Nancy:Production: Difference between revisions

Revision as of 17:30, 12 November 2015

Contents

Introduction

Using the resources

Getting an account

Learning to use Grid'5000

Using production resources

Dashboards and status pages

Contact information and support

FAQ

Data storage

I am physically located in the LORIA building, is there a shorter path to connect?

I have a large amount of jobs to execute, is there a better solution ?

How to access data in Inria/Loria

I submitted a job, there are free resources, but my job doesn't start as expected!

I have an important demo, can I reserve all resources in advance?

Energy costs

Navigation menu

Nancy:Production: Difference between revisions

Revision as of 17:30, 12 November 2015

Introduction

Using the resources

Getting an account

Learning to use Grid'5000

Using production resources

Dashboards and status pages

Contact information and support

FAQ

Data storage

I am physically located in the LORIA building, is there a shorter path to connect?

I have a large amount of jobs to execute, is there a better solution ?

How to access data in Inria/Loria

I submitted a job, there are free resources, but my job doesn't start as expected!

I have an important demo, can I reserve all resources in advance?

Energy costs

Navigation menu

Search