API: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
Line 285: Line 285:
* Add XML format;
* Add XML format;
* Finish to implements freshness and validation caching and explain how to use that.
* Finish to implements freshness and validation caching and explain how to use that.
== Caching ==
As described in the [http://tools.ietf.org/html/rfc2616#section-13 RFC2616]:
  HTTP is typically used for distributed information systems, where
  performance can be improved by the use of response caches. The
  HTTP/1.1 protocol includes a number of elements intended to make
  caching work as well as possible.
Hence, client applications can (and should) cache the responses so that subsequent requests for the same information use the cached data. The use of caching strategies can dramatically reduce delays and save bandwidth.
That's why most of the responses returned by the Grid5000 APIs include HTTP headers to support one or both of the caching models: expiration-based or validation-based. If you don't know what this means, read this article: http://tomayko.com/writings/things-caches-do.
Below are the different schemes that can exist when the API returns cacheable responses.
1. no caching
  O                          |-----|
  -|-  <-------internet------> | API |
  / \                          |-----|
2. the API builds the response once and stores it into cache for a certain amount of time.
  O                          |-----|      |-----|
  -|-  <-------internet------> |CACHE| <---> | API |
  / \                          |-----|      |-----|
3. the client receives the response once and stores it into cache for a certain amount of time.
  O        |-----|                        |-----|
  -|-  <---> |CACHE| <-------internet------> | API |
  / \        |-----|                        |-----|
4. both the client and the API have a cache in front of them.
  O        |-----|                        |-----|      |-----|
  -|-  <---> |CACHE| <-------internet------> |CACHE| <---> | API |
  / \        |-----|                        |-----|      |-----|
Most of the APIs of Grid5000 will use some kind of caching strategies from their side (scheme 2). It is highly recommended that client applications include a caching strategy in their implementation (scheme 4): this will save bandwidth, reduce latency and may improve the tolerance of the client to network outages.
Only a few HTTP libraries natively support client-side caching (e.g.: [http://code.google.com/p/httplib2/ httplib2] in Python). In ruby, there is none that correctly supports all the specifications of the RFC2616. Yet, one can very easily implement a basic caching strategy (in-memory or file-based) or use my [http://gist.github.com/58095 snippet of code], which subclasses the RestClient Resource into a CacheableResource that use the great Rack::Cache library to provide in-memory, file-based or memcached-based transparent caching.


== Links of interest ==
== Links of interest ==

Revision as of 15:23, 10 February 2009


ReferenceAPI

Synopsis

The reference API provides the reference data of Grid5000. Information such as the list of sites, clusters, nodes, environments, etc. can be queried using this API. You can also obtain a specific version of any data, list all the versions of a given information, and get an archive of all or part of the data.

REST API

 GET /resource.json

You may prefer to set the Accept HTTP header to the correct mime type, e.g.:

 GET /resource
 Accept: application/json

If you put both, the Accept HTTP header will be ignored.

Get a specific version of a resource

 GET /%resource%.%format%?version=%version%&depth=%depth%
  • Accepted formats: json, zip
  • Version parameter: can be omitted (latest version is returned), or it can be: a commit id (40 characters length) or a Unix UTC time (number of seconds since the Unix epoch: 1970-01-01 00:00:00 UTC)
  • Depth parameter: can be omitted (by default, there is only one level of nested sub-resources to be displayed in a json response). If you want to get more details with one request, you can set the depth parameter to a value between 1 and 3. This parameter has not effect with the zip format.
  • Comments: the zip format will return a zip archive containing the set of directories and files corresponding to the required data, with all its sub-resources.
  • Examples:
 GET /.json 

will return the description of the "root" resource, which is Grid5000:

 HTTP/1.1 200 OK
 Etag: "d69bfd1891582824f5a192fa41a3f444a3d52854"
 Last-Modified: Wed, 28 Jan 2009 13:46:19 GMT
 Content-Type: application/json;charset=utf-8
 Content-Length: 345
 Connection: close
 
 {
   "environments": [
     "\/environments\/sid-x64-base-1.0"
   ],
   "sites": [
     "\/sites\/bordeaux",
     "\/sites\/grenoble",
     "\/sites\/lille",
     "\/sites\/lyon",
     "\/sites\/nancy",
     "\/sites\/orsay",
     "\/sites\/rennes",
     "\/sites\/sophia",
     "\/sites\/toulouse"
   ],
   "uid": "grid5000",
   "type": "grid",
   "uri": "\/"
 }
 

From there, you can discover the composition of the grid by following the URIs to get more information about the sub-resources. For example, you can get more details about Bordeaux by querying:

 GET /sites/bordeaux.json

this will return something like:

 HTTP/1.1 200 OK
 Etag: "45f78b07665ed58f843a741d6927d60a4db35ba3"
 Last-Modified: Wed, 28 Jan 2009 13:46:19 GMT
 Content-Type: application/json;charset=utf-8
 Content-Length: 604
 Connection: close
 
 {
   "environments": [
     "\/sites\/bordeaux\/environments\/sid-x64-base-1.0"
   ],
   "name": "Bordeaux",
   "location": "Bordeaux, France",
   "latitude": null,
   "security_contact": null,
   "clusters": [
     "\/sites\/bordeaux\/clusters\/bordemer",
     "\/sites\/bordeaux\/clusters\/bordeplage",
     "\/sites\/bordeaux\/clusters\/bordereau",
     "\/sites\/bordeaux\/clusters\/borderline"
   ],
   "uid": "bordeaux",
   "type": "site",
   "user_support_contact": null,
   "description": "",
   "longitude": null,
   "email_contact": null,
   "web": null,
   "uri": "\/sites\/bordeaux",
   "sys_admin_contact": null
 }

and so on.

Get the list of all versions of a resource

 GET /%resource%/versions.%format%
  • Accepted formats: json
  • Response: the list of all changes that were made to the resource.

Status codes

  • A 200 status code is returned when the request is successful.
  • A 304 status code is returned when the requested resource has not been modified (used for caching purposes).
  • A 404 status code is returned when a resource does not exist.
  • A 406 status code is returned when the requested format is not available.
  • A 500 status code is returned when the server encountered an error.

Getting Started

Creating an SSH tunnel

The API is only available from the Grid5000 frontends. As a consequence, the first thing you have to do when trying to query the API is to create an SSH tunnel to the API server, via your site's access machine:

 ssh -N -L 8080:131.254.202.98:8080 login@access.site.grid5000.fr

Note: when you are done with your queries, you can hit CTRL-C to destroy the tunnel.

cURL example

Get the latest version of the composition of the platform:

 $ curl -i http://localhost:8080/reference/0_1/.json
 HTTP/1.1 200 OK
 Date: Thu, 29 Jan 2009 15:23:10 GMT
 Server: Apache
 X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.0.6
 Etag: "2e16727e9012f1d12e25010921a6a5fe71bda895"
 Last-Modified: Wed, 28 Jan 2009 13:46:19 GMT
 Content-Length: 345
 Connection: close
 Content-Type: application/json;charset=utf-8
 
 {
   "environments": [
     "\/environments\/sid-x64-base-1.0"
   ],
   "uri": "\/",
   "type": "grid",
   "sites": [
     "\/sites\/bordeaux",
     "\/sites\/grenoble",
     "\/sites\/lille",
     "\/sites\/lyon",
     "\/sites\/nancy",
     "\/sites\/orsay",
     "\/sites\/rennes",
     "\/sites\/sophia",
     "\/sites\/toulouse"
   ],
   "uid": "grid5000"
 }

Get the latest version of the platform with a depth of 2 (this will automatically "resolve" the first level of URIs so that the response body includes the description of the sites and environments):

 $ curl -i http://localhost:8080/reference/0_1/.json?depth=2
 
 HTTP/1.1 200 OK
 Date: Fri, 30 Jan 2009 10:34:59 GMT
 Server: Apache
 X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.0.6
 Etag: "9a69b8ce42e1618b0c0dee3112f5269eaaa28d6d"
 Last-Modified: Thu, 29 Jan 2009 16:50:28 GMT
 Content-Length: 6827
 Connection: close
 Content-Type: application/json;charset=utf-8
 
 ... the response body is too long to be displayed here ...
 

Get the composition of the platform as it was on Wed Jan 28 12:11:36 +0100 2009:

 $ curl -i http://localhost:8080/reference/0_1/.json?version=1233141096
 HTTP/1.1 200 OK
 Date: Thu, 29 Jan 2009 15:22:20 GMT
 Server: Apache
 X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.0.6
 Etag: "7a518f34482cd7a898b8428bac26eb9a6e457379"
 Last-Modified: Wed, 28 Jan 2009 11:11:36 GMT
 Content-Length: 299
 Connection: close
 Content-Type: application/json;charset=utf-8
 
 {
   "environments": [
     "\/environments\/sid-x64-base-1.0"
   ],
   "uri": "\/",
   "type": "grid",
   "sites": [
     "\/sites\/grenoble",
     "\/sites\/lille",
     "\/sites\/nancy",
     "\/sites\/orsay",
     "\/sites\/rennes",
     "\/sites\/sophia",
     "\/sites\/toulouse"
   ],
   "uid": "grid5000"
 }


Get the composition of the platform as it was after the change #30dc4c4fa25ee63c86a093cf6108259a128561b1

 $ curl -i http://localhost:8080/reference/0_1/.json?version=30dc4c4fa25ee63c86a093cf6108259a128561b1
 HTTP/1.1 200 OK
 Date: Fri, 30 Jan 2009 08:48:39 GMT
 Server: Apache
 X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.0.6
 Etag: "c6beebf69985bb57642defb4a3e6c0a3a1491783"
 Last-Modified: Wed, 28 Jan 2009 10:27:37 GMT
 Content-Length: 255
 Connection: close
 Content-Type: application/json;charset=utf-8
 
 {
   "environments": [
     "\/environments\/sid-x64-base-1.0"
   ],
   "uri": "\/",
   "type": "grid",
   "sites": [
     "\/sites\/grenoble",
     "\/sites\/orsay",
     "\/sites\/rennes",
     "\/sites\/sophia",
     "\/sites\/toulouse"
   ],
   "uid": "grid5000"
 }

Get all the versions of the platform:

 $ curl -i http://localhost:8080/reference/0_1/versions.json

Get all the versions of a cluster:

 $ curl -i http://localhost:8080/reference/0_1/sites/rennes/clusters/paramount/versions.json

Get the zip archive that contains the files describing the latest version of the rennes platform:

 $ curl http://localhost:8080/reference/0_1/sites/rennes.zip > rennes.zip

etc.

Ruby example

First, make sure you've got *Ruby* and *Rubygems* installed. Then install the required gems:

 sudo gem install rest-client json --no-ri --no-rdoc

Put this code in a g5k-reference-api-client.rb file. It will output the current list of sites and, for each one, the list of its clusters:

 require 'pp'
 require 'rubygems'
 require 'rest_client' # sudo gem install rest-client
 require 'json'        # sudo gem install json
 
 api = RestClient::Resource.new('http://localhost:8080/reference/0_1')
 begin
   puts "---- Getting Grid5000"
   # start at the root of the reference data (= grid5000)
   grid5000 = JSON.parse api['/'].get(:accept => 'application/json')
   pp grid5000
   puts "\n---- Getting sites"
   grid5000['sites'].each do |site_uri|
     site = JSON.parse api[site_uri].get(:accept => 'application/json')
     pp site
     puts "\n---- Getting #{site['uid']} clusters"
     site['clusters'].each do |cluster_uri|
       cluster = JSON.parse api[cluster_uri].get(:accept => 'application/json')
       pp cluster
     end
   end
 rescue RestClient::ResourceNotFound
   puts 'Resource not found.'
 rescue RestClient::RequestTimeout
   puts 'Timeout.'
 rescue RestClient::Unauthorized
   puts 'Unauthorized.'
 rescue RestClient::RequestFailed
   puts 'Request failed.'
 rescue RestClient::ServerBrokeConnection 
   puts 'Connection broken.' 
 rescue Exception => e
   puts e.message
 end

Run with:

 ruby g5k-reference-api-client.rb
 

The above example could be simplified by setting the depth parameter to 3:

 ...  
 api = RestClient::Resource.new('http://localhost:8080/reference/0_1')
 begin
   grid5000 = JSON.parse api['/?depth=3'].get(:accept => 'application/json')
   pp grid5000
 rescue
   ...
 end

Planned

  • Refine node description scheme;
  • Add missing data (environments, nodes);
  • Add XML format;
  • Finish to implements freshness and validation caching and explain how to use that.

Caching

As described in the RFC2616:

  HTTP is typically used for distributed information systems, where
  performance can be improved by the use of response caches. The
  HTTP/1.1 protocol includes a number of elements intended to make
  caching work as well as possible.

Hence, client applications can (and should) cache the responses so that subsequent requests for the same information use the cached data. The use of caching strategies can dramatically reduce delays and save bandwidth.

That's why most of the responses returned by the Grid5000 APIs include HTTP headers to support one or both of the caching models: expiration-based or validation-based. If you don't know what this means, read this article: http://tomayko.com/writings/things-caches-do.

Below are the different schemes that can exist when the API returns cacheable responses. 1. no caching

  O                           |-----|
 -|-  <-------internet------> | API |
 / \                          |-----| 

2. the API builds the response once and stores it into cache for a certain amount of time.

  O                           |-----|       |-----|
 -|-  <-------internet------> |CACHE| <---> | API |
 / \                          |-----|       |-----|

3. the client receives the response once and stores it into cache for a certain amount of time.

  O         |-----|                         |-----|
 -|-  <---> |CACHE| <-------internet------> | API |
 / \        |-----|                         |-----|

4. both the client and the API have a cache in front of them.

  O         |-----|                         |-----|       |-----|
 -|-  <---> |CACHE| <-------internet------> |CACHE| <---> | API |
 / \        |-----|                         |-----|       |-----|

Most of the APIs of Grid5000 will use some kind of caching strategies from their side (scheme 2). It is highly recommended that client applications include a caching strategy in their implementation (scheme 4): this will save bandwidth, reduce latency and may improve the tolerance of the client to network outages.

Only a few HTTP libraries natively support client-side caching (e.g.: httplib2 in Python). In ruby, there is none that correctly supports all the specifications of the RFC2616. Yet, one can very easily implement a basic caching strategy (in-memory or file-based) or use my snippet of code, which subclasses the RestClient Resource into a CacheableResource that use the great Rack::Cache library to provide in-memory, file-based or memcached-based transparent caching.

Links of interest