Storage5k: Difference between revisions

From Grid5000
Jump to navigation Jump to search
Line 157: Line 157:
   Connect to OAR job 488925 via the node suno-7.sophia.grid5000.fr
   Connect to OAR job 488925 via the node suno-7.sophia.grid5000.fr


   malachheb@suno-7:~$ df
   malachheb@suno-7:~$ df -ah
   Filesystem           1K-blocks      Used Available Use% Mounted on
   Filesystem           Size  Used Avail Use% Mounted on
   /dev/sda2             5771500  2372184  3106132 44% /
   /dev/sda2             5.6G  2.3G  3.0G 44% /
   tmpfs                 16504532        0 16504532   0% /lib/init/rw
   tmpfs                 16G    0   16G   0% /lib/init/rw
   udev                 16498628      120 16498508   1% /dev
  proc                    0    0    0  -  /proc
   tmpfs                 16504532        4 16504528   1% /dev/shm
  sysfs                    0    0    0  -  /sys
   /dev/sda5           558338164    202516 529773696   1% /tmp
   udev                   16G 120K  16G   1% /dev
   kadeploy:/grid5000/  82570240  64537088  13838848 83% /grid5000
   tmpfs                 16G  4.0K  16G   1% /dev/shm
  devpts                  0    0    0  -  /dev/pts
   /dev/sda5             533G  198M  506G   1% /tmp
   kadeploy:/grid5000/   79G  62G   14G 83% /grid5000
   nfs.sophia.grid5000.fr:/export/home
   nfs.sophia.grid5000.fr:/export/home
                    12690780160 1230415952 11460364208 10% /home
                      12T  1.2T  11T 10% /home
  fusectl                  0    0    0  -  /sys/fs/fuse/connections
  none                    0    0    0  -  /dev/cpuset
   stock.sophia.grid5000.fr:/data/malachheb_488924
   stock.sophia.grid5000.fr:/data/malachheb_488924
                       10321408    23040 10298368   1% /data/malachheb_488924
                       9.9G  23M 9.9G   1% /data/malachheb_488924
 


TODO: show process: oarsub -I, then df -ah, and explain that they'll find the storage spaces at /data/xxx.
TODO: show process: oarsub -I, then df -ah, and explain that they'll find the storage spaces at /data/xxx.

Revision as of 14:34, 16 January 2012

Grid'5000 storage resources

Sites VM Name Disk partition Size status
Sophia stock.sophia.grid5000.fr /dev/sda 2000 GB Check.png
Grenoble Fail.png
Lille Fail.png
Lyon Fail.png
Nancy Fail.png
Orsay Fail.png
Rennes Fail.png
Bordeaux Fail.png
Toulouse Fail.png

Purpose

This page presents ways to use storage5k for the storage reservation in Grid'5000.

Overview

Each user has a default storage space (/home) exported via NFS. For larger storage needs, a user can book another storage space for a determined period. storage5k allows users to reserve "storage" resources with OAR.

Usage

First check the chunk disk size of the oar storage resources

malachheb@fsophia:~$ storage5k -a chunk_size
Chunk size: 10737418240 (10 GB)

In this example the chunk size is 10GB, so for a reservation of 100 GB, you should ask for 10 chunks.

Storage reservation

A storage reservation is an OAR job with resources of type "storage". For each storage reservation you should specify the chunk number and the walltime (1 hour by default).

You can submit a storage reservation in two ways:

  • with storage5k command:
Terminal.png frontend:
storage5k -a add -l chunks=2,walltime=2
 malachheb@fsophia:~$ storage5k -a add -l chunks=2,walltime=2
 [ADMISSION RULE] Modify resource description with type constraints
 [ADMISSION RULE] Modify resource description as storage
 Generate a job key...
 OAR_JOB_ID=648
 Reservation mode : waiting validation...
 Reservation valid --> OK
  • with oarsub command:
Terminal.png frontend:
oarsub -t deploy -l chunks=2,walltime=2 -r "2011-12-09 11:57:00"
 malachheb@fsophia:~$ oarsub -t deploy -l chunks=2,walltime=2 -r "2011-12-09 11:57:00"
 [ADMISSION RULE] Modify resource description with type constraints
 [ADMISSION RULE] Modify resource description as storage
 Generate a job key...
 OAR_JOB_ID=649
 Reservation mode : waiting validation...
 Reservation valid --> OK
Note.png Note

A reservation can take a few minutes if the space is too large, due to the initialization of the reserved partition in the storage server.

Warning.png Warning

You must not submit a storage reservation simultaneously with compute nodes or other oar resources (vlan, subnet). Otherwise you can get an unexpected behaviour (this is a temporary limitation while we get things ironed out).

Check your storage reservation

Each storage reservation is identified by the job cpuset_name (e.g, malachheb_649). To display the details of your reservations, you can use "storage5k -a info".

Terminal.png frontend:
storage5k -a info
  malachheb@fsophia:~$ storage5k -a info 
  storage5k 1.0.0
  Job_Id: 648
   State: Running
   Reservation Id: malachheb_648
   Chunks: 2
   Chunks_size: 10737418240
   Space reserved: 21474836480
   Data Directory: /data/malachheb_648
   Free space: 20018364416
   Used space: 1119879168
   Source nfs: stock.sophia.grid5000.fr:/data/malachheb_648

See storage5k --help for the list of additional options you can pass. You can also use the oarstat command to display your job related to the storage reservation.

Each reservation is mounted by default in the directory "/data/" (e.g, /data/malachheb_648). You can make a soft link for this directory in your home.

Use your reserved space

The reserved space is used to store the input/output of your experiments, therefore you need to mount this space in your computing nodes later on. There are two use cases here: nodes reserved by an interactive job, and nodes reserved by a deploy job.

On nodes reserved by an interactive job

All your reserved storage spaces will be mounted/umounted automatically in these nodes.

  malachheb@fsophia:~$ oarsub -I 
  [ADMISSION RULE] Set default walltime to 3600.
  [ADMISSION RULE] Modify resource description with type constraints
  Generate a job key...
  OAR_JOB_ID=488925
  Interactive mode : waiting...
  Starting...
  Connect to OAR job 488925 via the node suno-7.sophia.grid5000.fr
  malachheb@suno-7:~$ df -ah
  Filesystem            Size  Used Avail Use% Mounted on
  /dev/sda2             5.6G  2.3G  3.0G  44% /
  tmpfs                  16G     0   16G   0% /lib/init/rw
  proc                     0     0     0   -  /proc
  sysfs                    0     0     0   -  /sys
  udev                   16G  120K   16G   1% /dev
  tmpfs                  16G  4.0K   16G   1% /dev/shm
  devpts                   0     0     0   -  /dev/pts
  /dev/sda5             533G  198M  506G   1% /tmp
  kadeploy:/grid5000/    79G   62G   14G  83% /grid5000
  nfs.sophia.grid5000.fr:/export/home
                      12T  1.2T   11T  10% /home
  fusectl                  0     0     0   -  /sys/fs/fuse/connections
  none                     0     0     0   -  /dev/cpuset
  stock.sophia.grid5000.fr:/data/malachheb_488924
                     9.9G   23M  9.9G   1% /data/malachheb_488924


TODO: show process: oarsub -I, then df -ah, and explain that they'll find the storage spaces at /data/xxx.

On nodes reserved by a deploy job

before, you should make a OAR deploy reservation and deploy your system image with kadeploy3 in your reserved nodes.

Terminal.png frontend:
oarsub -t deploy -l nodes=2 -r "2011-12-24 10:00:00"
Terminal.png frontend:
kadeploy3 -f $OAR_NODE_FILE -k -e squeeze-x64-nfs


Warning.png Warning


The deployed image should contain the nfs-common package, if not, you should install this package manually on the reserved nodes.
The following reference images already contain nfs-common: lenny-x64-nfs, lenny-x64-big, squeeze-x64-nfs, squeeze-x64-big, debian-x64-5-prod, squeeze-x64-prod.

TODO: give example of OAR deploy reservation and kadeploy command (oarsub -t deploy -l nodes=2 'echo "START"; kadeploy3 -f $OAR_NODE_FILE -k -e squeeze-x64-base; echo "DONE"; sleep 7200') TODO: storage5k should be made idempotent, i.e. it should keep saying success if the thing is already mounted, and the user tries to mount it again. TODO: storage5k should return an exist status != 0 if one of the command failed TODO: --add option of storage5k should be renamed --action to be more coherent. TODO: explain that you must be logged in as your grid5000 username on the nodes, if you want to be able to list and create files in the reserved storage spaces. TODO: explain that it does not work if you're using a non-nfs reference image, unless you add a user with the same uid/gid than your grid5000 username.

The user should mount/umount manually her reserved space in these nodes as follows.

  • Mount/Umount ALL reserved space in ALL reserved nodes of the job (the deploy job with jobid 651 contain two nodes suno-6 and suno-7):
Terminal.png frontend:
storage5k -a mount -j 651
  malachheb@fsophia:~$ storage5k -a mount -j 651
  mount the storage reservation malachheb_651 in node suno-6.sophia.grid5000.fr [ Success]
  mount the storage reservation malachheb_651 in node suno-7.sophia.grid5000.fr [ Success]
Terminal.png frontend:
storage5k -a umount -j 651
  malachheb@fsophia:~$ storage5k -a umount -j 651                                                                                                                                                             
  umount the storage reservation malachheb_651 in node suno-6.sophia.grid5000.fr [ Success]
  umount the storage reservation malachheb_651 in node suno-7.sophia.grid5000.fr [ Success]
  • Mount/Umount ALL reserved space in ONE reserved node:
Terminal.png frontend:
storage5k -a mount -j 651 -m suno-6.sophia.grid5000.fr
  malachheb@fsophia:~$ storage5k -a mount -j 651 -m suno-6.sophia.grid5000.fr
  mount the storage reservation malachheb_651 in node suno-6.sophia.grid5000.fr [ Success]
Terminal.png frontend:
storage5k -a umount -j 651 -m suno-6.sophia.grid5000.fr
  malachheb@fsophia:~$ storage5k -a umount -j 651 -m suno-6.sophia.grid5000.fr                                                                                                                                                            
  umount the storage reservation malachheb_651 in node suno-6.sophia.grid5000.fr [ Success]

Delete your storage reservation

  • storage5k -a del -j $jobid
Terminal.png frontend:
storage5k -a del -j 648
  malachheb@fsophia:~$ storage5k -a del -j 488787
  Deleting the job = 488787 ...REGISTERED.
  The job(s) [ 488787 ] will be deleted in a near future.
Warning.png Warning

If you remove your storage reservation, you will lose all your data in the reserved space.
Remember to save your data before the deletion of your storage reservation.

TODO: shouldn't it attempt to umount everything before deleting the reservation? Otherwise you get "df: `/data/crohr_488864': Stale NFS file handle" when listing mounted disks.