Storage: Difference between revisions

From Grid5000
Jump to navigation Jump to search
 
(24 intermediate revisions by 10 users not shown)
Line 10: Line 10:


In Grid'5000, there are multiple resources for data storage in experiments. Each has its own characteristics, advantages and disadvantages. They are summarised in the table below. Further details are discussed in the following sub-sections.
In Grid'5000, there are multiple resources for data storage in experiments. Each has its own characteristics, advantages and disadvantages. They are summarised in the table below. Further details are discussed in the following sub-sections.
Remember that Grid'5000 does NOT have a BACKUP service for storage it provides: it is your responsibility to save important data outside Grid'5000 (or at least to copy data to several Grid'5000 sites in order to increase redundancy).


== Comparison table of storage resources ==
== Comparison table of storage resources ==
Line 16: Line 18:
{| class="wikitable"
{| class="wikitable"
|-
|-
! Storage Resource !! Data recoverability? !! Protocol used !! Persistence period !! Provisioning mechanism !! Network connectivity !! Remarks
! Storage Resource !! Backups? !! Protocol used !! Persistence period !! Provisioning mechanism !! Network connectivity !! Remarks
|-
|-
| [[#.2Fhome|/home]] || No || NFS || long-term || Quota + User Acct mgmt || Variable (1Gb/s - 10 Gb/s) || Note-1
| [[#.2Fhome|/home]] || No || NFS || long-term || Quota + User Acct mgmt || Variable (1Gb/s - 10 Gb/s) || Note-1
|-
|-
| OSIRIM || No || NFS || long-term || Quota || 1Gb/s || Note-1
| [[#Group Storage|Group Storage]] || No || NFS || long-term || Manual || Variable (1Gb/s - 10 Gb/s) || Note-1
|-
|-
| [[#storage5k|storage5k]] || No || NFS || medium-term || OAR || Variable (1Gb/s - 10 Gb/s) || Note-1
| [[#On_node_local_disks_reservation|On node local disks reservation]] || No || - || medium-term || OAR || - || Note-2, Note-4
|-
|-
| [[#On_Node_local_disks_reservation|On node local disks reservation]] || No || - || medium-term || OAR || - || Note-2
| [[#Managed Ceph cluster|Managed Ceph cluster]] || No || RADOS || medium-term || Temporarily free (until contention for space) || ~n X 10 Gb/s (n = parallelism) || Note-3
|-
|-
| [[#Storage Array|Storage Array]] || No || NFS || long-term || Manual || 10Gb/s inside LAN, 6Gb/s between storage array and server || Note-1
| /tmp || No || - || short-term (job) || OAR job || - ||
|-
|-
| [[#Managed Ceph cluster|Managed Ceph cluster]] || No || RADOS || medium-term || Temporarily free (until contention for space) || ~n X 10 Gb/s (n = parallelism) || Note-3
| [[#Local_disks|Local disks]] || No || - || short-term (job) || OAR job || - || Note-4
|}
|}


* '''''Note-1''''': These storage resources use NFS accessed by multiple users. Hence, the performance is strongly dependent on the degree of contention during the time of experiment.
* '''''Note-1''''': These storage resources use NFS accessed by multiple users. Hence, the performance is strongly dependent on the degree of contention during the time of experiment.
* '''''Note-2''''': This storage uses the hard disks of the reserved node.
* '''''Note-2''''': This storage uses the local disks of the reserved node, but allows to reserve disks separately from the node.
* '''''Note-3''''': There are 2 factors to consider for performance:
* '''''Note-3''''': There are 2 factors to consider for performance:
** aggregated network bandwidth of all the nodes in the cluster:
** aggregated network bandwidth of all the nodes in the cluster:
*** for the managed Ceph Cluster at rennes (4 ''nodes'' each with 1 network interface), n = 4,
*** for the managed Ceph Cluster at rennes (4 ''nodes'' each with 1 network interface), n = 4,
*** for the managed Ceph Cluster at nantes (3 ''nodes'' each with 1 network interface), n = 3,
*** for the managed Ceph Cluster at nantes (3 ''nodes'' each with 1 network interface), n = 3,
** the aggregated disk bandwidth of all ''Object Storage Devices'' (OSD) used in the cluster. The recommendation is to have 1 OSD per physical disk.  
** the aggregated disk bandwidth of all ''Object Storage Devices'' (OSD) used in the cluster. The recommendation is to have 1 OSD per physical disk.
** For further interesting details on '''''maximum storage capacity''''' and '''''degree of parallelism''''' in Ceph clusters, see [[Unmaintained:BigData_Ceph5k_tutorial#Improving_performance_through_higher_parallelism_.28more_OSD_nodes.29|here]].
* '''''Note-4''''': Local disks need to be manually partitioned and formatted with a filesystem
 
== /home ==
This is the principal storage space when logged-in on a Grid'5000 site: <code class=replace>site</code><code class=file>:/home/</code><code class=replace>userid</code>. It is based on File System exposed by NFS.
 
Every user has a <code class=file>/home/</code><code class=replace>userid</code> storage in every Grid'5000 site (content is not automatically replicated nor backuped).
 
; In term of storage size:
Each user has a default quota of 25GB of storage on each site (soft limit), with a reserve of 100GB (hard limit).
* the soft limit is set to what the admins find a reasonable limit for an account on a more or less permanent basis. You can use more disk space temporarily, but you should not try and trick the system to keep that data on the shared file system.
* the hard limit is set so as to preserve usability for other users if one of your scripts produces unexpected amounts of data. You'll not be able to override that limit.
 
We have a '''limitation of 200GB per home dir''', above that threshold you should consider asking for a [[Group_Storage|group storage]] instead, preferably at the team level since it will be usable/shareable by all of your team mates. If your team does not have yet a group storage, talk about this to your GGA manager.
 
Should you need higher quotas, please visit your user account settings page at https://api.grid5000.fr/ui/account (''homedir quotas'' tab) at fill a request.
 
; In term of storage access:
Access to <code class=file>/home</code> is manageable using the [[Storage Manager]].


For further interesting details on '''''maximum storage capacity''''' and '''''degree of parallelism''''' in Ceph clusters, see [[Unmaintained:BigData_Ceph5k_tutorial#Improving_performance_through_higher_parallelism_.28more_OSD_nodes.29|here]].
; In term of files number
Each user has a default inodes hard quota of 10 million.
This means that users aren't able to store more than 10 million files on their home.


== /home ==
Information on your current inode usage can be obtained with the following command:
This is the principal storage space when logged-in on a Grid'5000 site: <code class="command">site:/home/userid</code>. It is based on File System exposed by NFS. Each user has a quota of 25GB of storage on each site, with a reserve of 100GB. If required, the user can request to increase the size of the quota, using the [https://api.grid5000.fr/ui/account#mystorage account management interface].
{{Term|location=frontend|cmd=<code class="command">quota</code>}}
 
The output of this command looks like this:
 
<syntaxhighlight>
user@frontend:~$ quota
Disk quotas for user <login> (uid <uid>):
    Filesystem  blocks  quota  limit  grace  files  quota  limit  grace
nfs:/export/home
                    48  25000000 100000000              12      10000000      0 
</syntaxhighlight>
Inodes quotas-related information are the last 3 numbers printed by the command.
On the output above, 12 files are currently stored, and the hard limit set is 10 million.  
 
{{Note|text=If you have reached your inode hard quota, you won't be able to create new files on your home. You will have to remove files from your home to be able to create new files again.}}
 
 
; if term of data sharing
 
Home directories are user protected by default.  
 
It is possible to use the <code>public</code> directory located under the home directory to share data with all others Grid'5000 users using the following URLs:
 
* from inside Grid'5000 : http://public.SITE.grid5000.fr/~USERNAME/
* from outside Grid'5000 (with Grid'5000 credentials basic authentication) : https://api.grid5000.fr/sid/sites/SITE/public/USERNAME/


== OSIRIM ==
== Group Storage ==
This storage space is available under <code class="command">/srv/osirim/<username></code> directory from frontends and nodes (in default environment or deployed -nfs and -big). It is provided by the OSIRIM project (IRIT lab) and exposed by NFS (using an autofs mount). Each user has a quota of 200GB of storage. If required, user can request to increase the size of the quota, by sending an email to [mailto:support-staff@lists.grid5000.fr support-staff@lists.grid5000.fr].
This service provides large storage spaces, possibly shared between multiple Grid'5000 users. Those storage spaces are accessible on Grid'5000 over NFS. User can ask for a group storage creation by following the procedure described in the [[Group Storage]] page.


== [[storage5k]] ==
'''''Advantages:''''' Shareable storage among a group of user, persistent, easy access
This is another shared storage resource offered on certain sites of Grid'5000 (e.g. rennes, nancy, sophia, luxembourg). Space on storage5k can be reserved in chunks of 10GB each, over weeks or months.


'''''Advantages:''''' Possibility of easy persistent storage over a series of experiments.  
'''''Disadvantage:''''' The disadvantages are those of an NFS server with multiple simultaneous users.


'''''Disadvantage:''''' The disadvantages are those of an NFS server with multiple simultaneous users. Also, this tool is not available from every Grid5000 site (see [[Storage5k]] for details)
'''''Documentation:''''' [[Group Storage]]


== [[Disk_reservation|On Node local disks reservation]] ==
== On node local disks reservation ==
[[Disk_reservation|Disk reservation]] consists in reserving extra local hard disks of nodes, in order to locally store large datasets between reservations, and thus avoid the need of moving data to nodes at the beginning of every node reservation.  
[[Disk_reservation|Disk reservation]] allows to reserve extra local hard disks of nodes, in order to locally store large datasets between reservations, and thus avoid the need of moving data to nodes at the beginning of every node reservation.  
Disk reservation provides medium-term storage persistence.
Disk reservation provides medium-term storage persistence.
Once reserved, these disks can be used exactly like [[#Local_disks]].


'''''Advantages:''''' Storage is available directly on the hard disk of the reserved node.
'''''Advantages:''''' Storage is available directly on the hard disk of the reserved node.
Line 62: Line 109:
'''''Disadvantage:''''' You need to reserve a hard disk on a node, and then reserve the same node for carrying out your experiment.
'''''Disadvantage:''''' You need to reserve a hard disk on a node, and then reserve the same node for carrying out your experiment.


== Storage Array ==
'''''Documentation:''''' [[Disk_reservation|Disk reservation]]
This is another shared resource that consists of a bay of RAID disks, located at the Rennes site, cumulatively offering ~180 TB of storage space. This is long-term persistent storage required over months (or the duration of a research project). Reservation is manual, it is not automatic. To reserve this resource, prospective users need to contact the Grid'5000 technical team (Email: support-staff@lists.grid5000.fr).


'''''Advantages:''''' Long-term dedicated storage over months. Hence, less time spent in marshalling datasets between experiments (to stay within your usage quotas).
== Local disks ==
Some clusters provide direct access to local disks, without the need to reserve them.
This provides a potentially large storage space for the duration of a job, but there is no guarantee that your data will be preserved if you reserve the same node again in the future


'''''Disadvantage:''''' Those of an NFS server. Network latencies also need to be taken into account. Currently, the storage resources are based in rennes site. For experiments running at other sites, one needs to be aware that NFS handles poorly inter-site latencies.
'''''Advantages:''''' Storage is available directly on the hard disk of the reserved node.


== [[Ceph|Managed Ceph clusters]] ==
'''''Disadvantage:''''' This storage cannot be used to store data with medium-term or long-term persistence.
 
'''''Documentation:''''' [[Disk_reservation#Using_local_disks_once_connected_on_the_nodes|Using local disks]]
 
== Managed Ceph clusters ==
These are '''''Object-based storage''''' resources (i.e. not offering a File System interface). They are based in <code class="command">rennes</code> site (~9 TB) and at <code class="command">nantes</code> site (~7 TB). Hence, overall they aggregate to ~15 TB of storage resources. For details about ''Managed Ceph'' resources see [[Ceph|'''''here''''']].
These are '''''Object-based storage''''' resources (i.e. not offering a File System interface). They are based in <code class="command">rennes</code> site (~9 TB) and at <code class="command">nantes</code> site (~7 TB). Hence, overall they aggregate to ~15 TB of storage resources. For details about ''Managed Ceph'' resources see [[Ceph|'''''here''''']].


Line 77: Line 129:


'''''Disadvantages:''''' These are ''Object-based storage'': hence non-accessible using direct Unix filesystem commands.
'''''Disadvantages:''''' These are ''Object-based storage'': hence non-accessible using direct Unix filesystem commands.
'''''Documentation:''''' [[Ceph|Ceph]]

Latest revision as of 08:23, 7 December 2023

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

This page gives a broad view of the different storage resources that are available for experiments on Grid5000.

The focus is on Big Data experiments. Advantages and disadvantages will be highlighted to help the user decide on the optimal combination of storage resources. This will help in two directions:

  • It will help the user achieve better quality in experiments
  • It will avoid excessive usage of any single type of resource by a user, thereby making them available mutually to other users of Grid'5000.

In Grid'5000, there are multiple resources for data storage in experiments. Each has its own characteristics, advantages and disadvantages. They are summarised in the table below. Further details are discussed in the following sub-sections.

Remember that Grid'5000 does NOT have a BACKUP service for storage it provides: it is your responsibility to save important data outside Grid'5000 (or at least to copy data to several Grid'5000 sites in order to increase redundancy).

Comparison table of storage resources

The following table summarises the comparison of different aspects of storage resources on Grid'5000 - both persistent and non-persistent types:

Storage Resource Backups? Protocol used Persistence period Provisioning mechanism Network connectivity Remarks
/home No NFS long-term Quota + User Acct mgmt Variable (1Gb/s - 10 Gb/s) Note-1
Group Storage No NFS long-term Manual Variable (1Gb/s - 10 Gb/s) Note-1
On node local disks reservation No - medium-term OAR - Note-2, Note-4
Managed Ceph cluster No RADOS medium-term Temporarily free (until contention for space) ~n X 10 Gb/s (n = parallelism) Note-3
/tmp No - short-term (job) OAR job -
Local disks No - short-term (job) OAR job - Note-4
  • Note-1: These storage resources use NFS accessed by multiple users. Hence, the performance is strongly dependent on the degree of contention during the time of experiment.
  • Note-2: This storage uses the local disks of the reserved node, but allows to reserve disks separately from the node.
  • Note-3: There are 2 factors to consider for performance:
    • aggregated network bandwidth of all the nodes in the cluster:
      • for the managed Ceph Cluster at rennes (4 nodes each with 1 network interface), n = 4,
      • for the managed Ceph Cluster at nantes (3 nodes each with 1 network interface), n = 3,
    • the aggregated disk bandwidth of all Object Storage Devices (OSD) used in the cluster. The recommendation is to have 1 OSD per physical disk.
    • For further interesting details on maximum storage capacity and degree of parallelism in Ceph clusters, see here.
  • Note-4: Local disks need to be manually partitioned and formatted with a filesystem

/home

This is the principal storage space when logged-in on a Grid'5000 site: site:/home/userid. It is based on File System exposed by NFS.

Every user has a /home/userid storage in every Grid'5000 site (content is not automatically replicated nor backuped).

In term of storage size

Each user has a default quota of 25GB of storage on each site (soft limit), with a reserve of 100GB (hard limit).

  • the soft limit is set to what the admins find a reasonable limit for an account on a more or less permanent basis. You can use more disk space temporarily, but you should not try and trick the system to keep that data on the shared file system.
  • the hard limit is set so as to preserve usability for other users if one of your scripts produces unexpected amounts of data. You'll not be able to override that limit.

We have a limitation of 200GB per home dir, above that threshold you should consider asking for a group storage instead, preferably at the team level since it will be usable/shareable by all of your team mates. If your team does not have yet a group storage, talk about this to your GGA manager.

Should you need higher quotas, please visit your user account settings page at https://api.grid5000.fr/ui/account (homedir quotas tab) at fill a request.

In term of storage access

Access to /home is manageable using the Storage Manager.

In term of files number

Each user has a default inodes hard quota of 10 million. This means that users aren't able to store more than 10 million files on their home.

Information on your current inode usage can be obtained with the following command:

Terminal.png frontend:
quota

The output of this command looks like this:

user@frontend:~$ quota
Disk quotas for user <login> (uid <uid>):
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
nfs:/export/home
                     48  25000000 100000000              12       10000000       0

Inodes quotas-related information are the last 3 numbers printed by the command. On the output above, 12 files are currently stored, and the hard limit set is 10 million.

Note.png Note

If you have reached your inode hard quota, you won't be able to create new files on your home. You will have to remove files from your home to be able to create new files again.


if term of data sharing

Home directories are user protected by default.

It is possible to use the public directory located under the home directory to share data with all others Grid'5000 users using the following URLs:

Group Storage

This service provides large storage spaces, possibly shared between multiple Grid'5000 users. Those storage spaces are accessible on Grid'5000 over NFS. User can ask for a group storage creation by following the procedure described in the Group Storage page.

Advantages: Shareable storage among a group of user, persistent, easy access

Disadvantage: The disadvantages are those of an NFS server with multiple simultaneous users.

Documentation: Group Storage

On node local disks reservation

Disk reservation allows to reserve extra local hard disks of nodes, in order to locally store large datasets between reservations, and thus avoid the need of moving data to nodes at the beginning of every node reservation. Disk reservation provides medium-term storage persistence. Once reserved, these disks can be used exactly like #Local_disks.

Advantages: Storage is available directly on the hard disk of the reserved node.

Disadvantage: You need to reserve a hard disk on a node, and then reserve the same node for carrying out your experiment.

Documentation: Disk reservation

Local disks

Some clusters provide direct access to local disks, without the need to reserve them. This provides a potentially large storage space for the duration of a job, but there is no guarantee that your data will be preserved if you reserve the same node again in the future

Advantages: Storage is available directly on the hard disk of the reserved node.

Disadvantage: This storage cannot be used to store data with medium-term or long-term persistence.

Documentation: Using local disks

Managed Ceph clusters

These are Object-based storage resources (i.e. not offering a File System interface). They are based in rennes site (~9 TB) and at nantes site (~7 TB). Hence, overall they aggregate to ~15 TB of storage resources. For details about Managed Ceph resources see here.

Advantages: Ceph is a distributed object storage system designed to provide excellent performance, reliability and scalability using multiple nodes.

Using Virtual Block Devices with managed Ceph backend: If experiments can support variable performances then the managed Ceph clusters offer additional persistent storage resources. They will be used in this tutorial also. Here is an example of using virtual block devices for creating a persistent Database service on a virtual machine.

Disadvantages: These are Object-based storage: hence non-accessible using direct Unix filesystem commands.

Documentation: Ceph