Energy consumption monitoring tutorial: Difference between revisions

From Grid5000
Jump to navigation Jump to search
 
(9 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Portal|User}}
{{Portal|User}}
{{Portal|Tutorial}}
{{TutorialHeader}}
= Introduction =
= Introduction =


This tutorial will show how to monitor energy on Grid'5000. Electrical power consumption of nodes can be retrieved from their Power Distribution Units (PDU), the device which supply them with electrical power.
This tutorial will show how to monitor energy on Grid'5000.  


On Lyon, Grenoble and Nancy sites, special devices (called "wattmeter") allow fine grained measurements (one measure each second, with sub-watt resolution). While less precise, many clusters from other sites also have monitoring capabilities.
On Lyon, Grenoble and Nancy sites, special devices (called "wattmeter") allow fine grained measurements (50 measure each second, with sub-watt resolution). In addition, electrical power consumption of nodes may sometimes be retrieved from their Power Distribution Units (PDU), the device which supply them with electrical power. While less precise, many clusters also have monitoring capabilities provided by their "BMC" (server specific adminstration board).


Grid'5000 also uses the [https://grid5000-kwapi-g5k.readthedocs.io/ Kwapi tool] to provide a convenient and consistent way to monitor energy consumption in experiments.
Grid'5000 uses the [[Monitoring Using Kwollect|Kwollect tool]] to provide a convenient and consistent way to monitor energy consumption, and other monitoring metrics, in experiments.


In the tutorial, you will learn how to retrieve energy consumed by Grid'5000 nodes by querying their PDUs or by using Kwapi.<!-- This will be compared to hardware monitoring counters available inside recent machines.--> The power consumption will be studied under various workload scenario and combinations of CPU energy saving parameters (P-State, C-State, etc.).
In the tutorial, you will learn how to retrieve energy consumed by Grid'5000 nodes. The power consumption will be studied under various workload scenario and combinations of CPU energy saving parameters (P-State, C-State, etc.).


This tutorial requires a basic knowledge of Grid'5000 usage (i.e. having competed [[Getting Started]] tutorial).
This tutorial requires a basic knowledge of Grid'5000 usage (i.e. having competed [[Getting Started]] tutorial).
Line 14: Line 16:
= Retrieving energy consumption data =
= Retrieving energy consumption data =


== By querying Power Distribution Units devices ==
== Using Kwollect ==
=== On Lyon site, Grenoble's yeti & troll clusters and Nancy gros cluster ===
{{Warning| text=Subject to change! API and URLs to access wattmetres monitoring data are likely to change in the future.}}


Lyon, Grenoble and Nancy sites provide dedicated devices, called "wattmetres" to monitor energy consumption (more information at [[Grenoble:Wattmetre]], [[Lyon:Wattmetre]], [[Nancy:Wattmetre]]). At Lyon and Nancy (gros nodes only), each node is supplied with electricity by a single plug coming from one of these devices. Grenoble's yeti nodes are supplied by two power power plug (see [[Grenoble:Wattmetre]]).  
[https://gitlab.inria.fr/grid5000/kwollect Kwollect] is a monitoring tool focus on environmental metrics such as electrical consumption. On Grid'5000, it permanently collects metrics on every nodes, network equipment, PDUs and Wattmeter and store them in a long term storage. Collected metrics are exposed to users through Grid'5000 API and a visualization dashboard based on Grafana.


Each node is monitored individually by wattmeters which provide the electrical power consumed each 20 milli-seconds, with a precision of 0.1 watt.
Kwollect usage on Grid'5000 has a [[Monitoring Using Kwollect|dedicated documentation]].


From inside Grid'5000 network, wattmeters can be queried in real-time to retrieve the average power consumed during the last second at URL <code>http://wattmetre.lyon.grid5000.fr/GetWatts-json.php</code> or <code>http://wattmetre.grenoble.grid5000.fr/GetWatts-json.php</code>
The visualization interface can display &quot;live&quot; view of energy being consumed by a node or by a group of nodes inside an OAR reservation. However for experimenting purpose, it may be more useful to get access to raw values available using APIs. It is available at <code>https://api.grid5000.fr/stable/sites/SITE/metrics/dashboard</code> (for instance: [https://api.grid5000.fr/stable/sites/lyon/metrics/dashboard Lyon])


For instance, from a frontend, use:
The Grid'5000 API is particularly suited to get data for measures performed in the past. For instance, to get the power consumed by nodes &quot;taurus-1&quot; and &quot;taurus-3&quot; as reported by wattmeters, at Lyon, between 10:35 and 10:40 on March, 21, use the URL:


  $ curl http://wattmetre.lyon.grid5000.fr/GetWatts-json.php | json_pp | less
  https://api.grid5000.fr/stable/sites/lyon/metrics?nodes=taurus-1,taurus-3&metrics=wattmetre_power_watt&start_time=2021-03-21T10:35&end_time=2021-03-21T10:40


'''Note''': In this command, we used ''curl'' to perform the HTTP request, ''json_pp'' to format the returned JSON text and ''less'' to ease reading the output. Of course you can use any of your preferred tools instead. In the following examples we will only provide URLs and won't mention those tools anymore, but you can keep using them that way. It will also be assumed that commands must be run from inside Grid'5000 (on frontends or nodes).
(beware if using this URL on a command line, quote it to avoid '&' being interpreted as the job control operator to put the command in background)
 
Note that the time range provided should be of the same order of magnitude as a typical job duration (e.g. no more than a few hours). Otherwise, requests must be serialized.
 
Other power consumption metrics may be available on clusters, such as: ''pdu_outlet_power_watt'' and ''bmc_node_power_watt''. See [[Monitoring_Using_Kwollect#Metrics_available_in_Grid.275000|Kwollect documentation]] for full list of metrics.


The "raw" data collected by wattmetres, including the 50 measurements made each second, is stored in CSV files and available from Grid5000 to download at: <code>http://wattmetre.lyon.grid5000.fr/data</code> and <code>http://wattmetre.grenoble.grid5000.fr/data</code>
Note that by default, wattmetre values are collected every one second using Kwollect (it stores the average of the 50 measurements performed over one second). If you need the 50 measurments every second, you must tell Kwollect to enable wattmetre's high frequency monitoring for your job at submission time:
 
$ oarsub -I -t monitor='wattmetre_power_watt'
 
== Raw wattmeters data ==
 
Lyon, Grenoble and Nancy sites provide dedicated devices, called "wattmetres", to monitor energy consumption (more information in [[Grenoble:Wattmetre]], [[Lyon:Wattmetre]], [[Nancy:Wattmetre]]). Nodes may be powered by 1 (e.g. clusters in Lyon, gros cluster in Nancy, troll cluster in Grenoble) or 2 supply units (e.g. yeti cluster in Grenoble). All power supply units are measured individually by the wattmeters, providing the electrical power consumed every 20 milliseconds (50hz), with a precision of 0.1 watts.
 
As seen above, the wattmetres values are provided by Kwollect. In addition, "raw" data collected by wattmetres, including the 50 measurements made each second, is stored in CSV files and available from Grid5000 network to download at: <code>http://wattmetre.lyon.grid5000.fr/data</code>, <code>http://wattmetre.grenoble.grid5000.fr/data</code> and <code>http://wattmetre.nancy.grid5000.fr/data</code>. Downloading raw data files might be more appropriate than using Kwollect to get monitoring values over a large period of time.


For each wattmetre, a new file is recorded at the beginning of every hour (files from past hours are kept compressed). The file name format is "power.csv.<YYYY-MM-DD>T<HH>", where <YYYY-MM-DD> is the date of the recording and <HH> the hour when it begun.
For each wattmetre, a new file is recorded at the beginning of every hour (files from past hours are kept compressed). The file name format is "power.csv.<YYYY-MM-DD>T<HH>", where <YYYY-MM-DD> is the date of the recording and <HH> the hour when it begun.
Line 44: Line 56:
  https://api.grid5000.fr/stable/sites/lyon/pdus/wattmetre1.json
  https://api.grid5000.fr/stable/sites/lyon/pdus/wattmetre1.json


under "ports" section, and wattmetre and port number connected to "nova-1" node is available at:
under "ports" section, and wattmetre and port number connected to "taurus-1" node is available at:
 
https://api.grid5000.fr/stable/sites/lyon/clusters/nova/nodes/nova-1.json
 
under "sensors/power/via/pdu" section.
 
A python sample code is available [https://gitlab.inria.fr/delamare/wattmetre-read/raw/master/tools/getwatt.py here] to help getting these power values.
 
=== On other sites ===
 
Some clusters are supplied by PDUs which allow per-plug electrical consumption monitoring. These are documented in reference API. Beware that these PDUs have shortcomings (truncated or smoothed values, etc). See [[Power_Monitoring Devices#measurement_artifacts_and_pitfalls]] for details.
 
The monitoring device available for a specific node is available in node description. For instance, for Lille's chifflet-1, it is at URL:
 
https://api.grid5000.fr/stable/sites/lille/clusters/chifflet/nodes/chifflet-1.json


In "sensors" entry of the returned JSON:
https://api.grid5000.fr/stable/sites/lyon/clusters/taurus/nodes/taurus-1.json


"sensors": {
under "pdu" section.
  "power": {
    "available": true,
    "via": {
      "api": {
        "metric": "power"
      },
      "pdu": [
        {
          "port": 10,
          "uid": "pdu-b3p1"
        },
        {
          "port": 10,
          "uid": "pdu-b3p2-1"
        }
      ]
    }
  }
},


== Intel RAPL data ==


This means that chifflet-1 has two power supply units, one connected to port 10 of the PDU called "pdu-b3p1", the other to port 10 of "pdu-b3p2-1"
Due to [https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.19.157 security reasons], to be able to read power data using the powercap interface, you need to authorize non-privileged user access in ''/sys/class/powercap/intel-rapl*/*/energy_uj'' on the node.


For nodes which don't have monitoring of their power consumption, the "sensors" part of the JSON would be empty. Some clusters only have "grouped" monitoring capabilities, meaning that energy consumption values are only available for groups of nodes, but is not available for individual nodes. Such nodes have a <code>wattmeter: shared</code> entry in their API description (as of March 2018, only hercule, parapide, griffon and some graphene nodes are concerned)
For example with sudo-g5k in the std environment:
 
  $ sudo-g5k chmod 444 /sys/class/powercap/intel-rapl/intel-rapl*/energy_uj
Now that we know which PDUs and ports are used to supply a node, how do we get the power consumed ? The way to build the appropriate request is also documented in the reference API, in the entry dedicated to the PDU. For instance, for "pdu-b3p1" PDU used by chifflet-1, the URL is:
 
https://api.grid5000.fr/stable/sites/lille/pdus/pdu-b3p1.json
 
This returns a URL containing various information about the PDU:
 
  "sensors": [
    {
      "power": {
        "per_outlets": true,
        "resolution": 1,
        "snmp": {
          "available": true,
          "outlet_prefix_oid": "iso.3.6.1.4.1.318.1.1.26.9.4.3.1.7",
          "total_oids": [
            "iso.3.6.1.4.1.318.1.1.12.1.16.0"
          ],
          "unit": "W"
        }
      }
    }
  ],
 
 
The power consumption is exposed using SNMP protocol at the OID specified at "outlet_prefix_oid" field. This OID is a prefix and must be appended with the PDU port number to monitor. For instance, we have seen that chifflet-1 is connected to PDU "pdu-b3p1" on the port number 10. So the corresponding OID is:
 
iso.3.6.1.4.1.318.1.1.26.9.4.3.1.7.10
 
We are now able to fetch the power consumption by using an SNMP request:
 
snmpget -v2c -c public pdu-b3p1.lille.grid5000.fr iso.3.6.1.4.1.318.1.1.26.9.4.3.1.7.10
 
Remember that chifflet-1 has two power supply units, thus its total power consumption is the sum of the power delivered by both PDU plugs it uses.
 
The power consumption value of the second PSU of chifflet-1 is available using the following SNMP request:
 
snmpget -v2c -c public pdu-b3p2-1.lille.grid5000.fr iso.3.6.1.4.1.318.1.1.26.9.4.3.1.7.10
 
== Using Kwapi ==
[https://grid5000-kwapi-g5k.readthedocs.io/en/latest/ Kwapi] is a tool dedicated to electrical power consumption and network traffic monitoring. On Grid'5000, it permanently collects these information (using same HTTP and SNMP requests presented above) on every nodes and store them in a long term storage (one year of data is kept). Collected metrics are exposed to users through several interfaces:
 
Kwapi is enabled on clusters where sufficiently accurate PDUs are available. The list of clusters where Kwapi is activated is available at: https://intranet.grid5000.fr/jenkins-status/?job=test_kwapi
 
* Grid'5000 API
* Kwapi internal API
* Web interface (for example at [https://intranet.grid5000.fr/supervision/lyon/monitoring/energy/last/minute/ Lyon])
 
The Web interface provides a &quot;live&quot; view of energy being consumed by a node or by a group of nodes inside an OAR reservation. However for experimenting purpose, it may be more useful to get access to raw values available using APIs.
 
The Grid'5000 API is particularly suited to get data for measures performed in the past. For instance, to get the power consumed by nodes &quot;nova-1&quot; and &quot;nova-2&quot; at Lyon, between 10:35 and 10:40 on March, 21, use the URL:
 
  https://api.grid5000.fr/stable/sites/lyon/metrics/power/timeseries?only=nova-1,nova-2&amp;from=1521624864&amp;to=1521625164
 
(beware if using this URL on a command line, quote it to avoid '&' being interpreted as the job control operator to put the command in background)
 
(values 1521624864 and 1521625164 are Unix timestamps for March, 21 10:35 and March, 21 10:40 dates)
 
Note that the time range provided should be of the same order of magnitude as a typical job duration (e.g. no more than a few hours). Otherwise, requests must be serialized.
 
The Kwapi internal API is more appropriate if you need to get &quot;instantaneous&quot; values of energy currently consumed. On a particular all values collected by Kwapi are available at URL <code>http://kwapi.<SITE>.grid5000.fr:5000/probes/</code>. For instance, at Lyon it is:
 
http://kwapi.lyon:5000/probes/
 
It returns, for each metric available, the list of available probes. Note that kwapi not only stores power measures, but also network measures. For example, to get the power metric for lyon.nova-23:
 
http://kwapi.lyon:5000/probes/lyon.nova-23/power/


= Power consumption under different workloads =
= Power consumption under different workloads =


In the previous section, we have learned how to retrieve energy consumption information: find on which nodes it is available, build requests to get consumption from PDU devices, use Kwapi to get the data.
In the previous section, we have learned how to retrieve energy consumption information. In this part, we will illustrate these monitoring features in an example scenario: We will show how energy consumption evolves under different workload, and the impact of various CPU's energy-related parameters.
 
In this part, we will illustrate these monitoring features in an example scenario: We will show how energy consumption evolves under different workload, and the impact of various CPU's energy-related parameters.


== Preliminary remarks ==
== Preliminary remarks ==


* In the examples given in this part, we will use the Kwapi interface exposed in Grid'5000 API. As stated earlier, Kwapi is currently only reliable on Lyon sites. So if you follow our implementation example, we encourage you to use Lyon site with a recent cluster such as ''nova''.
* In the examples given in this part, we will use the Kwollect through the Grid'5000 API.
* In this scenario, you need to reserve one node and install some additional tools inside it. As you will require to be root, you can use ''sudo-g5k'' to get sudo rights, or use kadeploy to deploy your own environment. Then, you can install the required tools with the following command:
* In this scenario, you need to reserve one node and install some additional tools inside it. As you will require to be root, you can use ''sudo-g5k'' to get sudo rights, or use kadeploy to deploy your own environment. Then, you can install the required tools with the following command:
  apt update && apt install linux-cpupower sysbench
  apt update && apt install linux-cpupower sysbench
Line 227: Line 139:
  from statistics import mean
  from statistics import mean
   
   
  def get_power(node, site, start, stop):
  def get_power(node, site, start, stop, metric="wattmetre_power_watt"):
     url = "https://api.grid5000.fr/stable/sites/%s/metrics/power/timeseries?resolution=1&only=%s&from=%s&to%s" \
     url = "https://api.grid5000.fr/stable/sites/%s/metrics?metrics=%s&nodes=%s&start_time=%s&end_time=%s" \
             % (site, node, int(start), int(stop))
             % (site, metric, node, int(start), int(stop))
     data = requests.get(url, verify=False).json()
     data = requests.get(url, verify=False).json()
     return mean(data['items'][0]['values'])
     return sum(item['value'] for item in data)/len(data)


</div>
</div>
Line 409: Line 321:
= Going further =
= Going further =
* The various monitoring devices used in Grid'5000 are presented in this page: [[Power Monitoring Devices]]
* The various monitoring devices used in Grid'5000 are presented in this page: [[Power Monitoring Devices]]
* More details about Grid'5000 monitoring capabilities are available at: [[Kwapi]] and [[Monitoring Using Kwollect]]
* More details about Grid'5000 monitoring capabilities with Kwollect are available at: [[Monitoring Using Kwollect]]
* More information about modifying CPU parameters on Grid'5000: [[CPU_parameters]]
* More information about modifying CPU parameters on Grid'5000: [[CPU parameters]]
* More information about Grid'5000 API, including its monitoring part (not only about power consumption): [[API_all_in_one_Tutorial]]
* More information about Grid'5000 API: [[API]]
* For more experiment scripting in Python, see [[Execo_Practical_Session]]
* For more experiment scripting in Python, see [[Execo Practical Session]]

Latest revision as of 08:18, 4 June 2024

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

Introduction

This tutorial will show how to monitor energy on Grid'5000.

On Lyon, Grenoble and Nancy sites, special devices (called "wattmeter") allow fine grained measurements (50 measure each second, with sub-watt resolution). In addition, electrical power consumption of nodes may sometimes be retrieved from their Power Distribution Units (PDU), the device which supply them with electrical power. While less precise, many clusters also have monitoring capabilities provided by their "BMC" (server specific adminstration board).

Grid'5000 uses the Kwollect tool to provide a convenient and consistent way to monitor energy consumption, and other monitoring metrics, in experiments.

In the tutorial, you will learn how to retrieve energy consumed by Grid'5000 nodes. The power consumption will be studied under various workload scenario and combinations of CPU energy saving parameters (P-State, C-State, etc.).

This tutorial requires a basic knowledge of Grid'5000 usage (i.e. having competed Getting Started tutorial).

Retrieving energy consumption data

Using Kwollect

Kwollect is a monitoring tool focus on environmental metrics such as electrical consumption. On Grid'5000, it permanently collects metrics on every nodes, network equipment, PDUs and Wattmeter and store them in a long term storage. Collected metrics are exposed to users through Grid'5000 API and a visualization dashboard based on Grafana.

Kwollect usage on Grid'5000 has a dedicated documentation.

The visualization interface can display "live" view of energy being consumed by a node or by a group of nodes inside an OAR reservation. However for experimenting purpose, it may be more useful to get access to raw values available using APIs. It is available at https://api.grid5000.fr/stable/sites/SITE/metrics/dashboard (for instance: Lyon)

The Grid'5000 API is particularly suited to get data for measures performed in the past. For instance, to get the power consumed by nodes "taurus-1" and "taurus-3" as reported by wattmeters, at Lyon, between 10:35 and 10:40 on March, 21, use the URL:

https://api.grid5000.fr/stable/sites/lyon/metrics?nodes=taurus-1,taurus-3&metrics=wattmetre_power_watt&start_time=2021-03-21T10:35&end_time=2021-03-21T10:40

(beware if using this URL on a command line, quote it to avoid '&' being interpreted as the job control operator to put the command in background)

Note that the time range provided should be of the same order of magnitude as a typical job duration (e.g. no more than a few hours). Otherwise, requests must be serialized.

Other power consumption metrics may be available on clusters, such as: pdu_outlet_power_watt and bmc_node_power_watt. See Kwollect documentation for full list of metrics.

Note that by default, wattmetre values are collected every one second using Kwollect (it stores the average of the 50 measurements performed over one second). If you need the 50 measurments every second, you must tell Kwollect to enable wattmetre's high frequency monitoring for your job at submission time:

$ oarsub -I -t monitor='wattmetre_power_watt'

Raw wattmeters data

Lyon, Grenoble and Nancy sites provide dedicated devices, called "wattmetres", to monitor energy consumption (more information in Grenoble:Wattmetre, Lyon:Wattmetre, Nancy:Wattmetre). Nodes may be powered by 1 (e.g. clusters in Lyon, gros cluster in Nancy, troll cluster in Grenoble) or 2 supply units (e.g. yeti cluster in Grenoble). All power supply units are measured individually by the wattmeters, providing the electrical power consumed every 20 milliseconds (50hz), with a precision of 0.1 watts.

As seen above, the wattmetres values are provided by Kwollect. In addition, "raw" data collected by wattmetres, including the 50 measurements made each second, is stored in CSV files and available from Grid5000 network to download at: http://wattmetre.lyon.grid5000.fr/data, http://wattmetre.grenoble.grid5000.fr/data and http://wattmetre.nancy.grid5000.fr/data. Downloading raw data files might be more appropriate than using Kwollect to get monitoring values over a large period of time.

For each wattmetre, a new file is recorded at the beginning of every hour (files from past hours are kept compressed). The file name format is "power.csv.<YYYY-MM-DD>T<HH>", where <YYYY-MM-DD> is the date of the recording and <HH> the hour when it begun.

Here is the meaning of columns in the CSV files:

  • 1st and 2nd columns: Debugging information (these columns will be removed in the future)
  • 3rd column: Timestamp when the measure was performed (as number of seconds and nano-seconds since 00:00:00 1970-01-01 UTC).
  • 4th column: Must be "OK" if the measure has correctly been performed, other it should be discarded
  • From 5th column to the last: Electrical power consumed for each wattmetre's port. The 5th column shows value for port number 0, the 6th for port number 1, etc. (beware that for yeti cluster in Grenoble, several ports are used to supply a single node). Sometimes the value may be missing for a particular port. It means that wattmetre was not able to compute it correctly.

The mapping between wattmetres' ports and Grid'5000 nodes is available in the Reference API. For instance, nodes connected to "wattmetre1" at Lyon are described at:

https://api.grid5000.fr/stable/sites/lyon/pdus/wattmetre1.json

under "ports" section, and wattmetre and port number connected to "taurus-1" node is available at:

https://api.grid5000.fr/stable/sites/lyon/clusters/taurus/nodes/taurus-1.json

under "pdu" section.

Intel RAPL data

Due to security reasons, to be able to read power data using the powercap interface, you need to authorize non-privileged user access in /sys/class/powercap/intel-rapl*/*/energy_uj on the node.

For example with sudo-g5k in the std environment:

$ sudo-g5k chmod 444 /sys/class/powercap/intel-rapl/intel-rapl*/energy_uj

Power consumption under different workloads

In the previous section, we have learned how to retrieve energy consumption information. In this part, we will illustrate these monitoring features in an example scenario: We will show how energy consumption evolves under different workload, and the impact of various CPU's energy-related parameters.

Preliminary remarks

  • In the examples given in this part, we will use the Kwollect through the Grid'5000 API.
  • In this scenario, you need to reserve one node and install some additional tools inside it. As you will require to be root, you can use sudo-g5k to get sudo rights, or use kadeploy to deploy your own environment. Then, you can install the required tools with the following command:
apt update && apt install linux-cpupower sysbench
  • The solutions are given in Python 3, can easily be copy/pasted to ipython3 interpreter.
apt install ipython3

Workload examples

We will consider 3 different workloads:

  1. Idle: Nothing is done of the machine
  2. CPU Intensive, mono-threaded: The machine run a CPU intensive application on one of its core. We will use the "sysbench" benchmarking tool to mimic this workload, invoked with:
    sysbench --test=cpu --cpu-max-prime=50000 --num-threads=1 run
  3. CPU Intensive, multi-threaded: The machine run a CPU intensive application on all of its core. We will also use "sysbench", invoked with:
NUM_THREADS=$(getconf _NPROCESSORS_ONLN)
sysbench --test=cpu --cpu-max-prime=50000 --num-threads=$NUM_THREADS run

($NUM_THREAD is the number of threads to run, we will use the number of cores avaible on the node we use)

Impact of CPU parameters

Several CPU parameters tries are available to lower energy consumed under certain workload. In particular:

  • C-States configuration is the ability for processors and cores to go to energy saver "sleep states" when not being used.
  • P-States policy dynamically adjusts voltage and frequency of cores to fit workload
  • Turboboost allows cores to run at higher frequency while they stay under temperature specification limits.

In this example scenario, we will investigate two different C-States configuration : Partially enabled (the maximum authorized sleep state is C1, this is the default on Grid'5000) and fully enabled (all sleep states are allowed, the deeper sleep state on modern machine is usually C6). To change the maximum allowed sleep state allowed, we will use cpupower command. For instance, to allow all sleep states available, use:

cpupower idle-set -E

To disable sleep states that would require more than 20 microseconds to be awakened from it (i.e. disable C-States higher than C1):

cpupower idle-set -D 20

We will also study the impact of turboboost by enabling (which is the default on Grid'5000) or disabling it. To disable turboboost, the following command must be used:

echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo


Scenario implementation

We propose to study following metrics:

  • Average electrical power required to run workload
  • Time needed to run CPU workload
  • The ops per watt value, i.e. the average number of operation per second and per Watt, a metric reflecting the "energy efficiency" of machines

The average electrical power required to run the workload is the amount of electrical energy spent during its execution divided by the execution time. Its value can be approximated as the average of the power values which have been monitored during execution.

Using your favorite programming language, write a function that queries the Grid'5000 API to return the average power used by a Grid'5000 node between two dates (as Unix timestamps).


Solution (in Python)

import requests
# you may need to install requests for python3 with sudo-g5k apt install python3-requests
from statistics import mean

def get_power(node, site, start, stop, metric="wattmetre_power_watt"):
    url = "https://api.grid5000.fr/stable/sites/%s/metrics?metrics=%s&nodes=%s&start_time=%s&end_time=%s" \
            % (site, metric, node, int(start), int(stop))
    data = requests.get(url, verify=False).json()
    return sum(item['value'] for item in data)/len(data)


Idle workload

First, we are going to investigate how C-States influence energy consumed when machine is idle.

Turn off C-States and leave the machine idle. What is the energy consumed during the last ten seconds ? Turn on C-States and repeat. How many Watts have been saved by C-States ?


Solution

from os import system
from time import sleep, time

# Turn off C-States
system("sudo cpupower idle-set -D0")
sleep(20)
power_cstate_off = get_power("nova-6", "lyon", time()-20, time()-10)

# Turn on C-States
system("sudo cpupower idle-set -E")
sleep(20)
power_cstate_on = get_power("nova-6", "lyon", time()-20, time()-10)

print(power_cstate_off - power_cstate_on)


CPU intensive, mono-threaded, workload

We are now going to run CPU intensive workload and see how CPU parameters influence the average power consumption but also the time spent to execute the workload.

For instance, turn off C-States and Turboboost and measure the workload runtime, and then get the average power consumed. Repeat with C-States turned on, with or without Turboboost. Which combination consumes less power ? Which one runs faster ? has the best ops/watt ratio ?


Solution

from os import system
from time import sleep, time

# Turn off C-States and Turboboost
system("sudo cpupower idle-set -D0")
system("echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo")

# Run workload
start = time()
system("sysbench --test=cpu --cpu-max-prime=20000 run")
stop = time()

# Get results
sleep(5)
power = get_power("nova-6", "lyon", start, stop)
result_1 = "C-States OFF, Turbo OFF, Duration: %f, Power: %f" % (stop-start, power)


# Turn on C-States
system("sudo cpupower idle-set -E")

# Run workload
start = time()
system("sysbench --test=cpu --cpu-max-prime=20000 run")
stop = time()

# Get results
sleep(5)
power = get_power("nova-6", "lyon", start, stop)
result_2 = "C-States ON, Turbo OFF, Duration: %f, Power: %f" % (stop-start, power)


# Turn on Turboboost
system("echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo")

# Run workload
start = time()
system("sysbench --test=cpu --cpu-max-prime=20000 run")
stop = time()

# Get results
sleep(5)
power = get_power("nova-6", "lyon", start, stop)
result_3 = "C-States ON, Turbo ON, Duration: %f, Power: %f" % (stop-start, power)

# Print results
print(result_1)
print(result_2)
print(result_3)


CPU intensive, multi-threaded, workload

We are now going to repeat the same experiment with a multi-threaded workload, running on every cores the machine has. Run the workload with or without C-States and Turboboost activated and observe runtime and power consumed. What can you say abount the influence of CPU parameters on multi-threaded, CPU intensive workload ? Is running multi-threaded is more energy efficient ?


Solution

from os import system
from time import sleep, time
import requests

# Get core count
core_count = requests.get(
                 "https://api.grid5000.fr/stable/sites/lyon/clusters/nova/nodes/nova-1",
                 verify=False
                 ).json()['architecture']['nb_cores']

# Turn off C-States and Turboboost
system("sudo cpupower idle-set -D0")
system("echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo")

# Run workload
start = time()
system("sysbench --test=cpu --cpu-max-prime=50000 --num-threads=%s run" % core_count)
stop = time()

# Get results
sleep(5)
power = get_power("nova-6", "lyon", start, stop)
result_1 = "C-States OFF, Turbo OFF, Duration: %f, Power: %f" % (stop-start, power)


# Turn on C-States
system("sudo cpupower idle-set -E")

# Run workload
start = time()
system("sysbench --test=cpu --cpu-max-prime=50000 --num-threads=%s run" % core_count)
stop = time()

# Get results
sleep(5)
power = get_power("nova-6", "lyon", start, stop)
result_2 = "C-States ON, Turbo OFF, Duration: %f, Power: %f" % (stop-start, power)


# Turn on Turboboost
system("echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo")

# Run workload
start = time()
system("sysbench --test=cpu --cpu-max-prime=50000 --num-threads=%s run" % core_count)
stop = time()

# Get results
sleep(5)
power = get_power("nova-6", "lyon", start, stop)
result_3 = "C-States ON, Turbo ON, Duration: %f, Power: %f" % (stop-start, power)

# Print results
print(result_1)
print(result_2)
print(result_3)


Going further