Production:Getting Started
Note | |
---|---|
This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team. |
Note | |
---|---|
2025-01-30 - A specific documentation Web site for Abaca will go live shortly. In the meantime, specific pages for “Production” use are hosted in the Grid'5000 documentation. |
This tutorial will guide you through your first steps on Abaca. Before proceeding, make sure you have a Abaca/Grid'5000 account (if not, follow this procedure), and an SSH client.
Getting support
Note | |
---|---|
Abaca's network and server infrastructure is currently shared with Grid'5000. Support is coordinated by the Abaca and Grid'5000 teams. |
The Support page describes how to get help during your Production (Abaca) usage (common with Grid'5000 usage)
There's also an Production:FAQ page with the most common question related to Production usage.
Before asking for support check if there is an ongoing maintenance or outage:
- Current or future downtime due to maintenance are available from https://www.grid5000.fr/status/.
Abaca general structure
Abaca provides access to a large-scale computing nodes distributed in serveral sites.
The primary way to move around Abaca (and Grid'5000) is using SSH.
As described in the figure below, when using Abaca, you will typically:
- connect, using SSH, to an access machine
- connect from this access machine to a site frontend
- on this site frontend, reserve resources (nodes), and connect to those nodes
The access.grid5000.fr SSH gateway is located in a DMZ network located at the Lille site.
This SSH gateway allows you to jump to the various Abaca and Grid'5000 sites.
Connecting for the first time
You must first obtain your account on the Abaca platform: Production#Getting_an_account
- NB: does not concern users from clusters migrated to Abaca (Sophia-Nef, Grenoble...), their account will be initialized by the Abaca technical team (they will receive an e-mail notification).
The primary way to move around Abaca is using SSH. A reference page for SSH is also maintained with advanced configuration options that frequent users will find useful.
Typically you will :
- connect, using SSH, to an access machine
- connect from this access machine to a site frontend
- on this site frontend, reserve resources (nodes), and connect to those nodes
Connect to Abaca access machine
To enter the Abaca network, one must use an access machine: access.grid5000.fr
(Note that access.grid5000.fr
is an alias to access-north
which is currently hosted in Lille, or exceptionally to access-south
currently hosted in Sophia Antipolis) when access-north
is unavailable (maintenance or outage).
For all connections, you must use the login
that was provided to you when you created your Abaca/Grid'5000 account.
You will get authenticated using the SSH public key you provided in the account creation form. Password authentication is disabled.
Connecting to a site
Abaca is structured in sites
(Grenoble
, Nancy
, Rennes
, Sophia
...). Each site hosts one or more clusters (homogeneous sets of machines, usually bought at the same time).
To connect to a particular site, do the following
- Home directories
You have a different home directory on each Abaca site, so you will usually use Rsync or scp
to move data around.
On access
machines, you have direct access to each of those home directories, through NFS mounts (but using that feature to transfer very large volumes of data is inefficient). Typically, to copy a file to your home directory on the Nancy site, you can use:
Abaca does NOT have a BACKUP service for users' home directories: it is your responsibility to save important data in someplace outside Abaca (or at least to copy data to several Abaca sites in order to increase redundancy).
Quotas are applied on home directories -- by default, you get 25 GB per site. If your usage requires more disk space, it is possible to request quota extensions in the account management interface, or to use other storage solutions (see Storage).
SSH configuration
- Configure SSH aliases using the ProxyCommand option. Using this, you can avoid the two-hops connection (access machine, then frontend) but establish connections directly to frontends. This requires using OpenSSH, which is the SSH software available on all GNU/Linux systems, MacOS, and also recent versions of Microsoft Windows.
Hostg5k
Userlogin
Hostname access.grid5000.fr ForwardAgent no Host*.g5k
Userlogin
ProxyCommand ssh g5k -W "$(basename %h .g5k):%p" ForwardAgent no
Reminder: login
is your Grid'5000 username
Once done, you can establish connections to any machine (first of all: frontends) inside Grid'5000 directly, by suffixing .g5k
to its hostname (instead of first having to connect to an access machine). E.g.:
VPN access
A VPN service is also available, allowing to connect directly to any Abaca machines (bypassing the access machines). See the VPN page for more information.
Managing data
Once you have logged on to Abaca and started your calculations, you need to transfer your data and store them on the platform.
The following table summaries the comparison of different storage resources that are available on Abaca - persistent or non-persistent types:
Storage Resource | Backups? | Protocol used | Persistence period | Provisioning mechanism | Network connectivity |
---|---|---|---|---|---|
/home | No | NFS | long-term | Quota + User Acct mgmt | Variable (1Gb/s - 10 Gb/s - 25 Gb/s) |
Group Storage | No | NFS | long-term | Manual | Variable (1Gb/s - 10 Gb/s - 25 Gb/s) |
/tmp | No | - | short-term (job) | OAR job | - |
Home directories
This is the principal storage space when logged-in on a Abaca site: site
:/home/
username
.
The home directory is a network filesystem (NFS): data in your home directory is not actually stored on the node itself, it is stored on a storage server managed by the Abaca team. In particular, it means that all reserved nodes share the same home directory, and it is also shared with the site frontend. For example, you can compile or install software in your home (possibly using pip, virtualenv), and it will be usable on all your nodes.
The home directory is only shared within a site. Two nodes from different sites will not have access to the same home.
- In term of storage size
Each user has a default quota of 25GB of storage on each site (soft limit), with a reserve of 100GB (hard limit).
- the soft limit is set to what the admins find a reasonable limit for an account on a more or less permanent basis. You can use more disk space temporarily, but you should not try and trick the system to keep that data on the shared file system.
- the hard limit is set so as to preserve usability for other users if one of your scripts produces unexpected amounts of data. You'll not be able to override that limit.
We have a limitation of 200GB per home dir, above that threshold you should consider asking for a group storage instead, preferably at the team level since it will be usable/shareable by all of your team mates. If your team does not have yet a group storage, talk about this to your Group manager.
Should you need higher quotas, please visit your user account settings page at https://api.grid5000.fr/ui/account (homedir quotas tab) at fill a request.
- In term of files number
Each user has a default inodes hard quota of 10 million. This means that users aren't able to store more than 10 million files on their home.
Information on your current inode usage on a site can be obtained with the following command launched on the site's frontend:
Group Storage
This service provides large storage spaces, possibly shared between multiple users (members of a research team or a laboratory or a specific project...). Those storage spaces are accessible on Abaca over NFS from all sites.
For instance, to access the "linkmedia" storage located on storage2 server located at Rennes from Nancy frontend:
Please remember that those data are hosted on a NFS server that is not recommended for compute usage.
/tmp
The /tmp/
directory is stored on a local disk of the node. Use this directory if you need to access data locally.
The size of /tmp is different from to node to node. It is equal to the total size of the (first) local disk minus 75 GB (which is reserved for the operating system).
How to put and retrieve your data
You have a different home directory on each Abaca site, so you will usually use Rsync to move data around.
On access
machines, you have direct access to each of those home directories, through NFS mounts (but using that feature to transfer very large volumes of data is inefficient). Typically, to copy a file to your home directory on the Nancy site, you can use:
We recommand to use rsync
instead of scp
for better performance with multiple files.
For a better bandwidth or latency, you may also be able to connect directly via the local access machine of one of the Abaca 5000 sites. Local accesses use access.
site
.grid5000.fr
instead of access.grid5000.fr
. However, mind that per-site access restrictions are applied: see External access for details about local access machines.
- other methods
- Access your data from your laptop using SSHFS
- Vscode usage is permitted with the following recommendations:
- Restrict the workspace to the folder containing your code and not the whole remote homedir or group storage
- Use an updated version of vscode remote extension (including memory leak bug fix)
- Prefer sshfs access to your remote data
- Edit files over SSH with your favorite text editor, with e.g.:
outside :
|
vim scp://nancy.g5k/my_file.c |
Discovering, visualizing and using resources
At this point, you are connected to Abaca and have transferred your data. You are now ready to use the platform's computing resources.
Exploring resources
There are several ways to learn about the site's resources and their status.
- Resources Explorer
- The Resources Explorer Hardware page contains a detailed description of the site's hardware and the access level you have based on the group(s) you belong to. For example:
- Grid'5000 resources are also available with option -q default but you have to follow the usage rules of Grid'5000
- MOTD
- The site's MOTD (message of the day) on frontend lists all clusters and their features. Additionally, it gives the list of current or future downtimes due to maintenance.
- Site's resources availability (OAR) status
Drawgantt (past, current and future OAR jobs scheduling) | |||||||||
Monika (current placement and queued jobs status) | |||||||||
Using resources
Each job (resource reservation) is assigned a priority level (from P1 - maximum - to P4) according to the resources requested, and the user's affiliation group to which the user belongs.
The job queue is sorted according to job priority: a job with priority P1 will always run before a job with priority P2, even if job P2 entered the queue first. On the other hand, lower-priority jobs are not interrupted if a higher-priority job is submitted (if a P2 job has already started, it is not interrupted when the P1 job enters the queue).
These levels P1 to P4 also define the maximum duration of jobs, the possibility (or otherwise) of reserving resources in advance (on a calendar basis), and the possibility of extending jobs.
On certain resources, certain groups of users may be restricted to using the use of “best-effort” jobs (“pre-emptible” jobs). These jobs are automatically stopped when a higher-priority job requires the resources in question.
Finally, on some resources, certain groups may have no access at all.
Queue Priority | Max walltime | Reservations in advance | Extension of job duration |
---|---|---|---|
P1 | 168h (seven days) | yes | yes |
P2 | 96h (four days) | no | no |
P3 | 48h (two days) | no | no |
P4 | 24h (one day) | no | no |
besteffort | N/A | no | no |
You can check your priority level for any cluster using the Resource explorer Browse page
Moreover, with p1 priority, user can submit advanced reservation. More information about that in the Advanced OAR Page. For example, to reserve one week from now:
p1 priority level also allow to extend the duration of a job. The extension is only apply 24h before the end of the job and cannot be longer than 168h. More information about this feature can be found also on the Advance Oar Page.
Warning | |
---|---|
These limits DO NOT replace the maximum walltime per node which are still in effects. |
When submitting a job, by default, you will be placed at the highest priority level that allows you to maximize resources:
Using the command above will generally place your job at the lowest priority to allow usage of all clusters, even those where your priority is p4.
When you specify a cluster, your job will be set to your highest priority level for that cluster:
You can also limit a job submission to a cluster at a specific priority level using -q
PRIORITY LEVEL
:
Exclusive use of team-funded resources
For team-funded resources, the resources are either shared with a higher higher priority for the team (SHARED), or exclusive to the team and only accessible on a “best-effort” basis by others (EXCLUSIVE).
- /!\ best-effort jobs can be preempted by other jobs with higher priority
Teams decide for themselves which access mode to use, which can evolve over time, with a maximum total duration of 5 years in EXCLUSIVE mode, which the team can spread out as it sees fit.
This enables the funding team to ensure that the resources will always be available for its users during periods of heavy use, while still allowing a high degree of sharing with other other teams outside these periods.
Submitting jobs with OAR
Interactive usage
To reserve a single host (one node) for one hour, in interactive mode, do:
As soon as the resource becomes available, you will be directly connected to the reserved resource with an interactive shell, as indicated by the shell prompt, and you can run commands on the node:
Reserving only part of a node
To reserve only one CPU core in interactive mode, run:
OAR properties usage
You can select ressources with specific hardware properties: cpu, memory, gpu model...
FIXME : mettre quelques exemples
Properties list can be found on OAR Properties and OAR_Syntax_simplification.
Non-interactive usage (scripts)
You can also simply launch your experiment along with your reservation:
Your program will be executed as soon as the requested resources are available. As this type of job is not interactive, you will have to check for its termination using the oarstat
command.
- Batch job using OAR scripts
Similarly to what it is the standard use for batch scheduler (e.g. SLURM), a good practice is to use a script that include the OAR directives to define the resource submission. Here is a simple example of such script that select a GPU with specific characteristics.
Properties list can be found on OAR Properties and OAR_Syntax_simplification.
#!/bin/bash
#OAR -q production
#OAR -l host=1/gpu=1
#OAR -l walltime=3:00:00
#OAR -p gpu-16GB AND gpu_compute_capability_major>=5
#OAR -O OAR_%jobid%.out
#OAR -E OAR_%jobid%.err
# display some information about attributed resources
hostname
nvidia-smi
# make use of a python torch environment
module load conda
conda activate pytorch_env
python3 -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))";
The script must be executable
and can be called from frontend using
and will start when resources will be available.
Change default job specifications
In Abaca the smallest unit of resource managed by OAR is the core (cpu core), but by default a OAR job reserves a host (physical computer including all its cpus and cores, and possibly gpus). Hence, what OAR calls nodes are hosts (physical machines). In the oarsub
resource request (-l
arguments), nodes is an alias for host, so both are equivalent. But prefer using host for consistency with other argumnents and other tools that expose host not nodes.
- Other types of resources
To reserve only one GPU (with the associated CPU cores and share of memory) in interactive mode, run:
To reserve several GPUs and ensure they are located in a single node, make sure to specify host=1
:
- Choosing the job duration
Of course, you might want to run a job for a different duration than one hour. The -l
option allows you to pass a comma-separated list of parameters specifying the needed resources for the job, and walltime
is a special resource defining the duration of your job:
The walltime is the expected duration you envision to complete your work. Its format is [hour:min:sec|hour:min|hour]
. For instance:
walltime=5
=> 5 hourswalltime=1:22
=> 1 hour and 22 minuteswalltime=0:03:30
=> 3 minutes and 30 seconds
- Working with more than one node
You will probably want to use more than one node on a given site.
To reserve two hosts (two nodes), in interactive mode, do:
or equivalently (nodes is an alias for host):
You will obtain a shell on the first node of the reservation. It is up to you to connect to the other nodes and distribute work among them.
By default, you can only connect to nodes that are part of your reservation. If you completely own the nodes within one job (or with one job per complete node), you will be able to connect those by using ssh
. In the case of nodes that are not completely owned within a job (if you have reserved only a part of the nodes or by having multiple jobs on nodes) you will have to use oarsh
connector to go from one node to the other. The connector supports the same options as the classical ssh
command, so it can be used as a replacement for software expecting ssh.
Note | |
---|---|
To take advantage of several nodes and distribute work between them, a good option is GNU_Parallel. |
oarsh
is a wrapper around ssh
that enables the tracking of user jobs inside compute nodes (for example, to enforce the correct sharing of resources when two different jobs share a compute node). If your application does not support choosing a different connector, be sure to reserve nodes entirely (which is the default with oarsub
) to be able to use ssh
.
- Selecting nodes from a specific cluster
- Reserve nodes from a specific cluster
- Selecting specific nodes
If you know the exact node you want to reserve, you can specify the hostname of the node you require:
If you want several specific nodes, you can use a list:
- Using OAR properties
The OAR nodes database contains a set of properties for each node, and the -p
option actually filters based on these properties:
- Nodes with Infiniband FDR interfaces:
- Nodes with 2 GPUs:
- Nodes with a specific CPU model:
- Since
-p
accepts SQL, you can write advanced queries:
fnancy :
|
oarsub -p "wattmeter=YES AND host NOT IN (graffiti-41, graffiti-42)" -l host=5,walltime=2 -I |
The OAR properties available on each site are listed on the Monika pages linked from Status (example page for Nancy). The full list of OAR properties is available on this page.
Note | |
---|---|
Since this is using a SQL syntax, quoting is important! Use double quotes to enclose the whole query, and single quotes to write strings within the query. |
Using nodes in the default environment
Debian std environment
When you run oarsub
, you gain access to physical nodes with a default (standard) software environment. This is a Debian-based system that is regularly updated by the technical team. It contains many pre-installed software.
The debian std (e.g. debian11-std) environments are the environments used on nodes by default, providing services such as oar-node as well as custom settings that are necessary for the default system but are useless for user-deployed nodes.
Getting access to the software you need
There are several options to get access to software :
- Many software packages are already installed and directly accessible: Git, editors, GCC, Python, Pip, Ruby, Java, ...
- Some software (mostly scientific software, such as MatLab) is available through modules. For a list, use
module avail
. Documentation (including how to access license tokens) is available in the Modules page. - If the software you need is not available through the above options, you can:
- Install it manually in your home directory
- Get root access on your node using the sudo-g5k command, and then customize the operating system. The node will be reinstalled at the end of your resource reservation, so that it is in a clean state for the next user. It is thus best to avoid running sudo-g5k in very short jobs, as this has a cost for the platform (see below).
- Install it using a user-level package manager, such as Guix (especially suitable for HPC software) and Conda (especially suitable for AI software)
- Install it using container technology, with Docker or Singularity/Apptainer
- Re-install the node using a custom image with Kadeploy, as described in the following section
- Engage in a discussion with the support team to see if the software you need could be added to the software available by default
Becoming root with sudo-g5k
On HPC clusters, users typically don't have root access. However, Abaca allows more flexibility: if you need to install additional system packages or to customize the system, it is possible to become root. The tool to do this is called sudo-g5k.
Note | |
---|---|
Using sudo-g5k has a cost for the platform: at the end of your job, the node needs to be completely reinstalled so that it is clean for the next user. It is best to avoid running sudo-g5k in very short jobs. |
Note | |
---|---|
Using sudo-g5k only work on full nodes, not partially reserved ones. |
(Optional) Deploying your nodes to get root access and create your own experimental environment
Note | |
---|---|
There is a tool, called |
Deploying a system on nodes with Kadeploy
Reserve one node (the deploy
job type is required to allow deployment with Kadeploy):
Start a deployment of the debian11-base
environment on that node (this takes 5 to 10 minutes):
By default, all the nodes of the reservation are deployed. Alternatively, you can use -m
to specify a node (such as -m
gros-42.nancy.grid5000.fr
).
Kadeploy copy your SSH key from ~/.ssh/authorized_keys
to the node's root account after deployment, so that you can connect without password. You may want to use another SSH key with -k
(such as -k
~/custom_authorized_keys
).
Finally, connect to the node as root:
On Grid'5000 reference environments
Grid'5000 reference environments are named accordingly to the following scheme: OS
version
-
architecture
-
variant
.
OS
version
is the OS distribution name and version, for instancedebian11
(Debian 11 "Bullseye", released on 08/2021),ubuntu2004
(Ubuntu 2004 "Focal", released on 04/2020), orcentos8
(Centos 8, clone of RHEL 8, released on 09/2019).variant
defines the set of features included in the environment, as follows:
Variant | OS available | Grid'5000-specific tuning for performance (e.g., TCP buffers for 10 GbE) |
Installed tools | Network storage accessible |
Hypervisor | ||||
---|---|---|---|---|---|---|---|---|---|
Stantard system utilities* |
Common utilities** |
Scientific software available via module |
Packages available via Guix |
Advanced packages*** | |||||
min | Debian | ||||||||
Ubuntu, CentOS, etc. | |||||||||
base | Debian | ||||||||
nfs | Debian | Support for:
- mounting your home, group - using Grid'5000 user account |
|||||||
Ubuntu, CentOS, etc. | |||||||||
big | Debian | ||||||||
xen | Debian | Xen hypervisor Dom0
+ minimal DomU |
* Including SSH server and network drivers.
** Including among others: Python, Ruby, curl, git, vim, etc.
*** Packages for development, system tools, editors and shells.
The list of all supported environments is available by running kaenv3
on any frontend. Note that environments are versioned: old versions can be listed using the kaenv3 -l -s
command and a former version retrieved and used by adding the --env-version
YYYYMMDDHH
option to the kaenv3
or kadeploy3
commands (also see the man
pages). This can be useful to reproduce experiments months or years later, using a previous version of an environment. On some sites, environments exist on different architectures (x86_64
, ppc64le
and aarch64
). The full list can be found in the Advanced Kadeploy page.
The Grid'5000 reference environments are build from recipes using the kameleon
tool from recipes detailing the whole construction process, and updated on a regular basis (see versions). See the Environment creation page for details.
Going further
In this tutorial, you learned the basics of Abaca:
- The general structure,
- How to connect and move between sites
- How to manage you data (one NFS server per site; remember: it is not backed up)
- How to find and reserve resources
- How to submit job using OAR and the
oarsub
command - How to get root access on nodes using Kadeploy and the
kadeploy3
command
You should now be ready to use Abaca.
There are many more tutorials available on the Users Home page. Please have a look at the page to continue learning how to use Abaca at :
Most are useful for Grid'5000 experimental usage. Those to be considered for Abaca production usage are :