CPU parameters: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
No edit summary
 
(39 intermediate revisions by 11 users not shown)
Line 1: Line 1:
{{See also| [[CPU parameters in Grid5000]] | [[Tuning CPU parameters]]}}
{{Portal|User}}
 
{{Portal|Tutorial}}
{{TutorialHeader}}
This page [[#Grid'5000 configuration|describes the CPU configuration of Grid'5000 systems]] and explains [[#Setting CPU parameters: Hyperthreading, C-State, P-State and Turboboost|how to change the CPU parameters]].
__TOC__
= CPU parameters in Grid'5000 : Hyperthreading, C-State, P-State and Turboboost =
= CPU parameters in Grid'5000 : Hyperthreading, C-State, P-State and Turboboost =


This page describes the CPU configuration of Grid'5000 systems.
This page focus on the following (mostly Intel) CPU parameters:
 
* [https://en.wikipedia.org/wiki/Hyper-threading Hyperthreading] (HT or SMT): each physical core can run multiple threads simultaneously so that several logical processing units are exposed to the operating system (e.g. 2 threads on many Intel CPUs and recent AMD CPUs, 8 on IBM Power8, ...).
This page focus on the following CPU parameters:
* [https://en.wikipedia.org/wiki/Hyper-threading Hyperthreading] (HT): for each physical core, the operating system addresses two logical cores.
* [https://software.intel.com/en-us/articles/power-management-states-p-states-c-states-and-package-c-states C-States]: Processors and cores idle states management.
* [https://software.intel.com/en-us/articles/power-management-states-p-states-c-states-and-package-c-states C-States]: Processors and cores idle states management.
* [https://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1 P-States]: Dynamic voltage and frequency scaling (DVFS).
* [https://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1 P-States]: Dynamic voltage and frequency scaling (DVFS).
Line 20: Line 21:


Grid'5000 reference images:
Grid'5000 reference images:
* The '''min''' reference images is a standard Debian installation. Therefore (with Debian Jessie) the P-States governor is '''powersave''' for clusters using the '''intel_pstate''' module and '''ondemand''' for clusters using the '''acpi-cpufreq''' module.  
* The '''min''' reference images is a standard Debian installation. Therefore (with Debian 9) the P-States governor is '''powersave''' for clusters using the '''intel_pstate''' module and '''ondemand''' for clusters using the '''acpi-cpufreq''' module.  
* Starting with '''base''', the Grid'5000 reference images are tuned for performance. In particular, the P-States governor is ''''performance'''' and C-States are limited to C0, C1 and C1E (see [https://github.com/grid5000/environments-recipes/tree/master/virtualbox/steps/data/puppet/modules/env/manifests/base changes]). This applies to '''xen''', '''nfs''', '''big''' and '''std''' environments. See [[Tuning_CPU_parameters#Grid.275000_specifics|Tuning_CPU_parameters]] for more information.
* Starting with '''base''', the Grid'5000 reference images are tuned for performance. In particular, the P-States governor is ''''performance'''' and the Linux kernel's default of allowing all C-States . This applies to '''nfs''', '''big''' and '''std''' environments.


The following table presents the status of the platform as of 09/2016.
The following table presents the status of the platform:
{| class="G5KDataTable" border="1"
! | Installation date
! | Site
! | Cluster
! | CPU model
! |
! |
! |
! | Server type
! | HT enabled
! | Turboboost enabled
! | P-State driver
! | C-State driver
|-
|2007-01-01 || lyon || sagittaire || Opteron || 250 || || 2.40GHz || Sun Fire V20z       ||  {{No}}  || {{No}}        || none        || none
|-        
|2007-02-23 || sophia || sol         || Opteron || 2218 || || 2.60GHz || Sun Fire X2200 M2      ||  {{No}}  ||  {{No}}        || none      || none


|-        
{{:Generated/CPUParameters}}
|2007-11-08 || luxembourg|| granduc  || Xeon || L5335 || Merom || 2.00GHz || Dell PowerEdge 1950ES  ||  {{No}} ||  {{No}}        || none       || none
|-                        
|2008-03-01 || lille || chinqchint || Xeon || E5440 || Penryn || 2.83GHz || SGI Altix Xe 310      ||  {{No}}  ||  {{No}}        || none       || none
|-        
|2008-10-01 || grenoble || genepi  || Xeon || E5420 || Penryn || 2.50GHz || Bull R422-E1          ||  {{No}}  || {{No}}  || acpi-cpufreq || none
|-                        
|2009-03-01 || nancy || griffon || Xeon || L5420 || Penryn || 2.50GHz || Carri System CS-5393B  ||  {{No}}  || {{No}}  || acpi-cpufreq || none
|-        
|2009-08-01 || nancy || talc || Xeon || L5420 || Penryn || 2.50GHz || Carri System CS-5393B  ||  {{No}}  || {{No}}  || acpi-cpufreq || none
|-                        
|2009-10-15 || rennes || parapide || Xeon || X5570 || Nehalem || 2.93GHz || SUN FIRE X2270       ||  {{No}}  ||    {{Yes}}      || acpi-cpufreq || intel_idle
|-        
|2009-12-01 || grenoble || edel  || Xeon || E5520 || Nehalem || 2.27GHz || Bullx Blade B500      ||  {{No}}  || {{No}}  || acpi-cpufreq || intel_idle
|-                        
|2009-12-01 || sophia || suno || Xeon || E5520 || Nehalem || 2.27GHz || Dell R410       ||  {{No}}  ||  {{No}}      || acpi-cpufreq    ||  intel_idle
|-        
|2010-03-19 || grenoble || adonis || Xeon || E5520 || Nehalem || 2.27GHz || Bullx R422-E2          ||  {{No}}  || {{Yes}} || acpi-cpufreq || intel_idle
|-                        
|2010-10-06 || rennes || parapluie || Opteron || 6164HE || || 1.70GHz || HP Proliant DL165 G7  ||  {{No}}  || {{No}}  || acpi-cpufreq || none
|-        
|2011-01-01 || lille || chirloute || Xeon || E5620 || Westmere || 2.40GHz || Dell PowerEdge C6100  ||  {{No}}  ||    {{No}}      || none       || intel_idle
|-                        
|2011-01-03 || nancy || graphene || Xeon || X3440 || Nehalem || 2.53GHz || Carri System CS-5393B  ||  {{No}}  || {{Yes}} || acpi-cpufreq || intel_idle
|-                        
|2011-03-28 || lille || chimint || Xeon || E5620 || Westmere || 2.40GHz || Dell PowerEdge R410    ||  {{No}}  ||  {{No}}        || none       || intel_idle
|-                        
|2011-04-20 || reims || stremi || Opteron || 6164HE || || 1.70GHz || HP Proliant DL165 G7  ||  {{No}}  || {{No}}  || acpi-cpufreq || none
|-                        
|2012-07-26 || lyon || hercule || Xeon || E5-2620 || Sandy Bridge || 2.00GHz || Dell PowerEdge C6220  ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2012-07-26 || lyon || orion || Xeon || E5-2630 || Sandy Bridge || 2.30GHz || Dell PowerEdge R720    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2012-07-26 || lyon || taurus || Xeon || E5-2630 || Sandy Bridge || 2.30GHz || Dell PowerEdge R720    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2012-11-29 || nantes || econome || Xeon || E5-2660 || Sandy Bridge || 2.20GHz || Dell PowerEdge C6220  ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2013-09-11 || luxembourg|| petitprince || Xeon || E5-2630L || Sandy Bridge || 2.00GHz || Dell PowerEdge M620    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2013-12-01 || nancy || graphite || Xeon || E5-2650 || Sandy Bridge || 2.00GHz || Dell PowerEdge R720    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2013-12-02 || rennes || paranoia || Xeon || E5-2660 || Sandy Bridge || 2.20GHz || Dell PowerEdge C6220  ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2014-08-01 || nancy || graphique || Xeon || E5-2620 v3 || Haswell || 2.40GHz || Dell PowerEdge R730    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-        
|2015-01-01 || rennes || parasilo || Xeon || E5-2630 v3 || Haswell || 2.40GHz || Dell PowerEdge R630    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2015-01-01 || rennes || paravance || Xeon || E5-2630 v3 || Haswell || 2.40GHz || Dell PowerEdge R630    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2016-01-01 || nancy || graoully || Xeon || E5-2630 v3 || Haswell || 2.40GHz || Dell PowerEdge R630    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2016-01-05 || nancy || grimoire || Xeon || E5-2630 v3 || Haswell || 2.40GHz || Dell PowerEdge R630    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|-                        
|2016-01-05 || nancy || grisou || Xeon || E5-2630 v3 || Haswell || 2.40GHz || Dell PowerEdge R630    ||  {{Yes}} || {{Yes}} || intel_pstate || intel_idle
|}                


<br/>
<br />


Up-to-date information can be found by querying the [[API_all_in_one_Tutorial|Grid'5000 Reference-API]]:
Up-to-date information can be found by querying the [[API|Grid'5000 Reference-API]]:
{{Term|location=fnancy|cmd=curl -k https://api.grid5000.fr/stable/sites/nancy/clusters/graoully/nodes/graoully-1?pretty}}
{{Term|location=fnancy|cmd=curl -k https://api.grid5000.fr/stable/sites/nancy/clusters/graoully/nodes/graoully-1?pretty}}


See also [[Special:G5KHardware]] for more information about Grid'5000 hardware.
See also [[Hardware]] for more information about Grid'5000 hardware.


== Checking the configuration ==
== Checking the configuration ==
Line 110: Line 40:


{{Term|location=fnancy|cmd=<code class="command">oarsub</code><code> -I -l nodes=1,walltime=00:30 -t </code><code class="replace">deploy</code>}}
{{Term|location=fnancy|cmd=<code class="command">oarsub</code><code> -I -l nodes=1,walltime=00:30 -t </code><code class="replace">deploy</code>}}
{{Term|location=fnancy|cmd=<code class="command">kadeploy3</code><code> -f $OAR_NODE_FILE -e </code><code class="env">jessie-x64-base</code><code> -k</code>}}
{{Term|location=fnancy|cmd=<code class="command">kadeploy3</code><code> -f $OAR_NODE_FILE -u deploy -e </code><code class="env">debian11-x64-std</code><code> -k</code>}}
{{Term|location=node|cmd=<code class="command">g5k-checks</code> -m api}}
{{Term|location=node|cmd=<code class="command">g5k-checks</code> -m api}}
{{Term|location=node|cmd=<code class="command">cat /tmp/*.yaml</code>}}
{{Term|location=node|cmd=<code class="command">cat /tmp/*.yaml</code>}}


* The [http://execo.gforge.inria.fr/doc/latest-stable/userguide.html#check-cpu-performance-settings-of-each-grid5000-clusters Execo user guide] also provides an example for checking the CPU performance settings of Grid5000 clusters.
* The [http://execo.gforge.inria.fr/doc/latest-stable/userguide.html#check-cpu-performance-settings-of-each-grid5000-clusters Execo user guide] also provides an example for checking the CPU performance settings of Grid5000 clusters.
= Setting CPU parameters: Hyperthreading, C-State, P-State and Turboboost =
The following explains how to change the CPU parameters by using either sysfs (with root privileges) or kernel boot parameters.
* [[#Hyperthreading (HT)]]
* [[#C-States]]
* [[#P-States]]
* [[#Turboboost]]
== Hyperthreading (HT) ==
On Intel Xeon, HT provides two logical cores per physical core. With HT, one physical core appears as two processors to the operating system.
=== Checking the configuration ===
* Is the processor HT-capable?
** <code>cat /proc/cpuinfo | grep flags</code>: flag list should include ‘ht’
* Is HT enabled?
** <code>lscpu | grep 'Thread(s) per core'</code> 1: HT is disable, 2: HT is enabled
=== Enabling/Disabling HT ===
* HT must be turned on in the BIOS configuration to enable changing the HT configuration with kernel boot parameters or as root at runtime. Every cluster installed since 2012 is configured like that.
* Note that on some clusters, the BIOS parameters might be ignored, see Bug [https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=5229 #5229].
==== Setting up HT using root privilege (recommended) ====
* HT can be turned ON or OFF using the [https://www.kernel.org/doc/Documentation/cpu-hotplug.txt cpu-hotplug] capability of the linux kernel.
* Interface: <code>/sys/devices/system/cpu/cpu*</code>
* To disable HT:
** <code>for i in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | awk -F',' '{print $2}' | sort -u); do echo 0 &gt; /sys/devices/system/cpu/cpu$i/online; done</code>
** if the processor has more than two hyperthreads per core: <code>for i in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -d ',' -f 2- | tr , '\n'); do echo 0 > /sys/devices/system/cpu/cpu$i/online; done</code>
* To enable HT:
** <code>for i in $(ls /sys/devices/system/cpu/cpu*/online); do echo $i; echo 1 &gt; $i; done</code>
'''Note:''' HT must be turned ON in the BIOS configuration and the kernel boot parameters should not restrict the visibility of the logical cores (ie. no boot parameters or <code>maxcpus</code>+<code>additional_cpus</code> set to number of logical cores. See below).
==== Kernel boot command-line parameters (alternative) ====
HT settings can also be controlled using Kernel boot parameter <code>maxcpus=n</code> to restrict boot time cpus to <code>n</code>. It can be used to turn off HT: for example, using <code>maxcpu=4</code> at the boot of a quad-core machine disables HT properly because each of the 4 first logical cores correspond to one physical cores. See bug [https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=5229 #5229] for additional information.==
With Kadeploy, you can specify kernel boot parameters on the environment description file:
* Get the description of '''debian11-x64-base''':
<pre>kaenv3 -p debian11-x64-base -u deploy > mydebian-x64-base.env</pre>
* Edit the '''mydebian-x64-base.env''' and add a '''kernel_params''' entry within the existing '''boot''' entry of the YAML file:
<pre>boot:
  kernel: "/vmlinuz"
  initrd: "/initrd.img"
  kernel_params: maxcpus=4
</pre>
* Deploy the environment:
<pre>
oarsub -I -t deploy -l nodes=1,walltime=1
kadeploy3 -f $OAR_NODEFILE -a mydebian-x64-base.env -k
</pre>
* Note that the kernel parameters specified in the environment description file are appended to the default kernel parameters of the cluster.
* You can check the kernel parameters with <code>cat /proc/cmdline</code>
=== References ===
* General documentation:
** [https://www.kernel.org/doc/Documentation/kernel-parameters.txt Kernel boot parameters documentation]
** [http://www.faqs.org/docs/Linux-HOWTO/BootPrompt-HOWTO.html BootPrompt-HOWTO: a compilation of kernel boot parameters]
* CPU Hotplug:
** [https://www.kernel.org/doc/Documentation/cpu-hotplug.txt Kernel cpu-hotplug.txt]
* Methods for enabling/disabling HT:
** [http://serverfault.com/questions/235825/disable-hyperthreading-from-within-linux-no-access-to-bios Serverfault.com: Several methods for disabling HT without BIOS access]
** [http://www.bigdatamark.com/disabling-hyperthreading-in-centosrhel/ Blog entry about CPU hotplug]
* Related bugs:
** [https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=5229 #5229]: Hyperthreading is activated on the parapide cluster even if it is disabled in bios (fixed)
== C-States ==
[https://software.intel.com/en-us/articles/power-management-states-p-states-c-states-and-package-c-states C-States] are power modes that put various processor subsystems to sleep '''when the CPU is idle'''.
The deeper C-States levels save more power but require more time to get the CPU active again. The C-States are:
* '''C0''': the CPU is actively running code (ie. it is the non-idle state)
* '''C1''': the CPU is idle but quick to wake-up
* '''C2''' and up: extra power saving states
There is also a C-State called '''C1E''': C1E replaces C1 when C1E is enabled on the BIOS and OS configuration. C1E allows lower CPU’s speed and voltage.
The [http://en.community.dell.com/cfs-file.ashx/__key/telligent-evolution-components-attachments/13-4491-00-00-20-22-77-64/Controlling_5F00_Processor_5F00_C_2D00_State_5F00_Usage_5F00_in_5F00_Linux_5F00_v1.1_5F00_Nov2013.pdf Dell whitepaper about Controlling Processor C-State Usage in Linux] is a great resource to learn about C-States.
C-States are managed by the operating system using the [https://lwn.net/Articles/384146/ '''cpuidle'''] subsystem and an idle driver (either <code>intel_idle</code> or <code>acpi_idle</code>). The default driver is <code>intel_idle</code> on new kernel and hardware.
=== C-States Drivers ===
==== <code>intel_idle</code> driver ====
* This driver does not use ACPI. It directly uses knowledge of Intel CPU hardware.
* The C-States list of this driver might differ from the list provided by ACPI.
* C1E can be disable via the driver (as it is view as one of the C-State in the C-State list).
==== <code>acpi_idle</code> driver ====
* This driver is used when <code>intel_idle</code> is disabled.
* It takes into account both the BIOS parameters and the kernel parameters.
* Procfs interface: <code>cat /proc/acpi/processor/CPUx/power</code>
* Companion tool: <code>acpitool -c</code>
=== Checking the configuration (Sysfs) ===
* Driver is in use:
** <code>cat /sys/devices/system/cpu/cpuidle/current_driver</code>
* Name and Latency of C-States:
** <code>cat /sys/devices/system/cpu/cpu*/cpuidle/state*/name</code>
** <code>cat /sys/devices/system/cpu/cpu*/cpuidle/state*/latency</code>
* Idle state statistics:
** <code>/sys/devices/system/cpu/cpu*/cpuidle/state*/usage</code>
** Idle state statistics can be retrieve more easily with cpupower (see below).
* Sysfs also provides an interface to know if the C-States are disabled but as there is multiple ways to disable C-States, '''you cannot relies on it'''. The more reliable way to check if C-States are enabled or disabled is to '''monitor the CPU idle state statistics'''. Note also that C-States can be disabled independently of each other. For the record, here are the Sysfs interface to know if C-States are disabled:
** Are C-States disabled ? <code>cat /sys/devices/system/cpu/cpu*/cpuidle/state*/disable</code>.
*** This is only correct if the C-States were disabled with the sysfs interface. It is not correct when the CPU latency is limited with <code>/dev/cpu_dma_latency</code>.
** Max C-States allowed by the intel_idle driver: <code>cat /sys/module/intel_idle/parameters/max_cstate</code>.
*** It '''only''' provides the value corresponding to the <code>intel_idle.max_cstate</code> kernel parameter.
* The <code>/proc/acpi/processor/*/power</code> interface has been removed from the kernel.
=== Checking the configuration (Tools) ===
'''Cpupower''' retrieves CPU information from the sysfs interface (apt-get install linux-cpupower):
* <code>cpupower idle-info</code> works for <code>both intel_idle</code> and <code>acpi_idle</code> and provides the same information as <code>/sys/devices/system/cpu/cpu*/cpuidle/state*/</code>.
* <code>cpupower monitor -m Idle_Stats</code> gives idle state statistics and is a reliable way to check if C-States are enabled. On the following example, C-States are fully enabled and the CPU is idle:
<pre>root@graphene-143:~# cpupower monitor -m Idle_Stats
    |Idle_Stats                       
CPU | POLL | C1-N | C1E- | C3-N | C6-N
  0|  0.00|  0.00|  0.00|  0.00| 53.04
  1|  0.00|  0.00|  0.00|  0.00| 95.43
  2|  0.00|  0.00|  0.00|  0.04| 76.96
  3|  0.00|  0.00|  0.00|  0.00| 99.97
</pre>
* [http://manpages.ubuntu.com/manpages/trusty/man1/cpupower-idle-info.1.html Documentation and limitation of the monitor]:
<pre>
Idle_Stats
    Shows  statistics  of  the  cpuidle  kernel  subsystem.  Values  are  retrieved  from  /sys/devices/sys‐
    tem/cpu/cpu*/cpuidle/state*/.  The  kernel  updates  these values every time an idle state is entered or
    left. Therefore there can be some inaccuracy when cores are in an idle state for some time when the  mea‐
    sure starts or ends. In worst case it can happen that one core stayed in an idle state for the whole mea‐
    sure time and the idle state usage time as exported by the kernel did not get updated.  In  this  case  a
    state residency of 0 percent is shown while it was 100.</pre>
'''Extra tools you might find useful:'''
* <code>i7z</code>: can be used to check C-state usage regardless of which idle driver is being used. It uses [https://en.wikipedia.org/wiki/Model-specific_register MSR] information.
* <code>[https://en.wikipedia.org/wiki/PowerTOP powertop]</code>: tool to find out what is using power
* <code>turbostat</code>: report processor frequency and idle statistics
* <code>[https://www.open-mpi.org/projects/hwloc/ hwloc]</code>: detect the hierarchical topology of the hardware architectures
=== Enabling/Disabling C-States ===
Note that disabling entirely C-States (ie. only allowing C0) interferes with HT and HT should be disable when the CPU is forced to stay on the C0 state.
==== Dynamic Control of the C-States using root privilege (recommended) ====
* C-States can be disabled on a per core and per C-State basis:
** <code>echo 1 &gt; /sys/devices/system/cpu/cpu0/cpuidle/state3/disable</code>
** or <code>cpupower idle-set -d 3</code>
* To enable back a C-State:
** <code>echo 0 &gt; /sys/devices/system/cpu/cpu0/cpuidle/state3/disable</code>
** or <code>cpupower idle-set -e 3</code>
* You can also limit the allowed C-States by using the Power management Quality of Service (PM QOS) interface. Indeed, requesting a low latency prevents the processor from entering deep sleep states.
** The file <code>/dev/cpu_dma_latency</code> can be used to set a maximum allowable latency: Write a number to this file representing the maximum allowed response time in microseconds.
** This file must be kept open as long as you want to limit the latency.
** The latency of C-States are given by <code>/sys/devices/system/cpu/cpu*/cpuidle/state*/latency</code>. "0" means only allowing C0.
** Setting a maximum latency does not update <code>/sys/devices/system/cpu/cpu0/cpuidle/state*/disable</code> or <code>/sys/module/intel_idle/parameters/max_cstate</code>.
** More information can be found [https://access.redhat.com/articles/65410 here] and [http://en.community.dell.com/cfs-file.ashx/__key/telligent-evolution-components-attachments/13-4491-00-00-20-22-77-64/Controlling_5F00_Processor_5F00_C_2D00_State_5F00_Usage_5F00_in_5F00_Linux_5F00_v1.1_5F00_Nov2013.pdf here].
** Here is a ruby example for using <code>/dev/cpu_dma_latency</code>:
<pre>
#!/usr/bin/ruby
if ARGV.empty?
  puts "Usage: sudo ./limit_ctates.rb <latency>"
  exit
end
latency = ARGV[0]
['INT', 'TERM'].each { |sig|
  Signal.trap(sig) {
    $f.close()
    exit
  }
}
$f = File.open("/dev/cpu_dma_latency", "w")
$f.syswrite(latency)
sleep
</pre>
On taurus, it seems that limiting latency to 80 ms (C3) also enables C-States up to C7. However, on Graphene, limiting latency to C3 does work.
==== Kernel boot command-line parameters ====
===== <code>intel_idle</code> driver =====
* This driver mostly ignores BIOS settings and kernel parameters but <code>idle=halt</code> automatically disable cpuidle including intel_idle, in newer kernels
* <code>intel_idle.max_cstate=0</code> disables intel_idle and fall back on acpi_idle.
* <code>intel_idle.max_cstate=[1-6]</code> specifies the maximum depth of C-states.
===== <code>acpi_idle</code> driver =====
* The depth of C-States can be set with <code>processor.max_cstate=n</code>
* Note that when <code>processor.max_cstate=0</code> is used, the kernel actually silently sets it to 1.
===== Other kernel parameters =====
* <code>idle=halt</code>: allows C0-C1(E) (''Halt'' means C1). It allows for low latency.
* <code>idle=poll</code>: CPU will stay in C0 (''Poll'' means C0). It allows for extremely low latency: The processor will stay in C0 and kept busy in a loop. It increases power usage considerably.
* <code>idle=mwait</code>: has been [http://stackoverflow.com/questions/12111954/context-switches-much-slower-in-new-linux-kernels removed] from 3.x kernels.
=== Influence of C-States on power usage, network latency and P-States ===
C-States are a power saving feature of the CPU. Here are some power usage measurement performed on Taurus ({{Bug|6570}}):
* C0/POLL (0ms): 183 W
* C1S (2 ms): 144 W
* C1E (10ms): 113 W
C-States impact CPU wake-up latencies and have therefore an impact on network latency when the nodes are not busy (see {{Bug|5368}} for an example). You should disable C-States for maximum performances. Here are some ping latency measurements on the Infiniband interface of Graphene when nodes are idle:
* C0 (0 ms): ping in ~ 0.030 ms
* C1 (3 ms): ping in ~ 0.030 ms
* C1E (10 ms): ping in ~ 0.045 ms
* C3 (20 ms): ping in ~ 0.140 ms
* C6 (200 ms): ping in ~ 0.200 ms
=== References ===
* [https://www.kernel.org/doc/Documentation/cpuidle/sysfs.txt Kernel CPU idle levels]
* [http://en.community.dell.com/cfs-file.ashx/__key/telligent-evolution-components-attachments/13-4491-00-00-20-22-77-64/Controlling_5F00_Processor_5F00_C_2D00_State_5F00_Usage_5F00_in_5F00_Linux_5F00_v1.1_5F00_Nov2013.pdf Dell whitepaper about Controlling Processor C-State Usage in Linux]
* [http://manpages.ubuntu.com/manpages/wily/man1/cpupower-idle-set.1.html Man cpupower-idle-set]
* [http://manpages.ubuntu.com/manpages/wily/man1/cpupower.1.html Man cpupower]
* [http://www.breakage.org/2012/11/14/processor-max_cstate-intel_idle-max_cstate-and-devcpu_dma_latency/ Blog entry about processor.max_cstate, intel_idle.max_cstate and /dev/cpu_dma_latency]
* [http://nidirondel.free.fr/llibre/noyau.htm Notes about C-States]
* [https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=5368 #5368] IPoIB latency on graphene (fixed)
== P-States ==
Modern CPU supports [https://en.wikipedia.org/wiki/Dynamic_frequency_scaling dynamic frequency scaling] to reduce both the electric energy consumption and the heat generated by the processor. The P-States are the various frequency settings supported by the CPU. C-States and P-States are independent from each other. To understand the difference between C-States and P-States, you can read [https://software.intel.com/en-us/blogs/2008/03/12/c-states-and-p-states-are-very-different this] or [http://searchservervirtualization.techtarget.com/answer/Differentiating-C-state-and-P-state-in-server-power-management this].
P-States are managed by [https://www.kernel.org/doc/Documentation/cpu-freq/ cpu-freq] and kernel drivers (either <code>intel_pstate</code> or <code>acpi-cpufreq</code>). The default driver is <code>intel_pstate</code> on new kernels and new Intel hardwares. Each driver implements several ''[https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt governors]'', ie. policies for the CPU frequency scaling algorithms.
=== P-States drivers ===
==== <code>acpi_cpufreq</code> driver ====
This driver supports five governors. The governors <code>performance</code>, <code>powersave</code> and <code>userspace</code> set the frequency statically whereas the governors <code>ondemand</code> and <code>conservative</code> set the CPU depending on the current CPU usage (dynamic scaling).
{| class="wikitable"
|-
! Governor !! Description
|-
| performance || Sets the CPU frequency to the value defined in <code>/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq</code>
|-
| powersave || Sets the CPU frequency to the value defined in <code>/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq</code>
|-
| userspace || Sets the CPU frequency to the value defined in <code>/sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed</code>
|-
| ondemand || Set the CPU frequency to <code>scaling_max_freq</code> when the CPU load threshold (default is 95%) is reached.
|-
| conservative || Same as ''ondemand'' but increases the frequency by steps. The default threshold is 80%.
|}
Those governors are described [https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt in the Linux kernel documentation]. The governors <code>ondemand</code> and <code>conservative</code> can be finely tuned using the sysfs file accessible parameters described in the documentation (<code>/sys/devices/system/cpu/cpufreq/<governor>/*</code> appears when the governor is in used).
==== <code>intel_pstate</code> driver ====
* This [https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt driver] supports two governors: <code>performance</code> and <code>powersave</code>.
* The <code>performance</code> governor is similar to the acpi_cpufreq <code>performance</code> governor, but the policy of the intel-pstate <code>powersave</code> governor depends on the CPU usage. In fact, it corresponds to the <code>ondemand</code> governor of the acpi_cpufreq driver.
* The meanings of the data in <code>/sys/devices/system/cpu/cpu*/cpufreq/</code> differs from the acpi_cpufreq driver. See [https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt intel-pstate.txt] for more information.
* The driver also provides its own sysfs interface within <code>/sys/devices/system/cpu/intel_pstate/</code>.
=== Checking the configuration ===
<code>cpupower frequency-info</code> provides the cpufreq kernel information in a consolidated manner. Information are gathered from the sysfs <code>/sys/devices/system/cpu/cpu*/cpufreq/</code> interface. The sysfs interface is described [https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt here].
Driver and hardware capability:
* <code>scaling_driver</code> is the driver in use.
* <code>scaling_available_governors</code> lists the available governors for the driver in use.
* <code>scaling_available_frequencies</code> lists the frequencies that are available with your CPU model (acpi-cpufreq only).
* <code>cpuinfo_min_freq</code> and <code>cpuinfo_max_freq</code> gives the frequency range capability of the hardware.
C-States configuration:
* <code>scaling_governor</code> is the current governor.
* <code>scaling_max_freq</code> and <code>scaling_min_freq</code>: current frequency range limit used by the governor. When setting a policy you need to first set scaling_max_freq, then scaling_min_freq.
* <code>scaling_cur_freq</code>: current frequency of the CPU as obtained from the hardware, in KHz. The Intel driver shows the frequency in used but acpi-cpufreq displays the requested frequency (it might change in the future). The same is true for <code>/proc/cpuinfo</code>.
* <code>scaling_cur_freq</code>: frequency the kernel thinks the CPU runs at.
* <code>scaling_setspeed</code> (write-only): is used by the userspace governor of the acpi-cpufreq driver. Irrelevant for other governors or the intel driver.
<code>cpufrequtils</code> are unmaintained/deprecated/dead and sys-power/cpupower should be used instead.
=== Setting up P-States using root privileges ===
The P-States configuration can be modified using <code>cpupower frequency-set</code> or by echoing into the following files: <code>scaling_governor</code>, <code>scaling_max_freq</code>, <code>scaling_min_freq</code> and <code>scaling_setspeed</code>. Modifying some parameters might be irrelevant for the current driver or the current governor.
<pre>
$ cpupower frequency-set -g ondemand # set governor
$ cpupower frequency-set -u 2.60GHz    # set the maximum allowed frequency
$ cpupower frequency-set -d 1.80GHz    # set the minimum allowed frequency
$ cpupower frequency-info # check the configuration
</pre>
On [https://wiki.debian.org/HowTo/CpuFrequencyScaling Debian] system, you can make changes permanent by editing <code>/etc/default/cpufrequtils</code>. This is the configuration file of the <code>cpufrequtils</code> daemon. Note that other distribution might use another daemon (like cpupower and /etc/default/cpupower). You can also use sysfsutils and /etc/sysfs.conf for managing the configuration.
<pre>
# /etc/default/cpufrequtils
governor="ondemand"
max_freq="2.60GHz"
min_freq="1.80GHz"
</pre>
[https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt Intel driver specificities]:
* The <code>scaling_governor</code> is the same for each CPU. The last requested policy is applicable to all CPUs.
* The <code>scaling_max_freq</code> and <code>scaling_min_freq</code> can be used to set the P-State range of the CPUs but as frequencies are converted to the nearest possible P-State available, this is subject to rounding errors. You can use instead the <code>/sys/devices/system/cpu/intel_pstate/[min_perf_pct,max_perf_pct]</code> parameters of Intel P-State Sysfs.
=== References ===
* [http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf P-States at the LinuxCon Europe 2015]
* [https://www.kernel.org/doc/Documentation/cpu-freq/ Linux kernel cpu-freq documentation]
* [https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt Documentation of the intel-pstate driver]
* [https://github.com/pyamsoft/pstate-frequency pstate-frequency: an utility for controlling P-States with the intel_pstate driver]
* [http://askubuntu.com/questions/698195/how-to-make-cpugovernor-intel-pstate-stable Questions about the intel_pstate driver]
* [https://plus.google.com/+ArjanvandeVen/posts/dLn9T4ehywL Blog entry about Some basics on CPU P-States on Intel processors]
* [http://www.phoronix.com/scan.php?page=article&item=intel_pstate_linux315&num=1 A performance comparison of the ACPI CPUfreq and Intel P-State drivers]
=== Related bugs ===
* [https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=5327 #5327] Low CPU frequency after deployment (fixed)
* [https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=5368 #5368] IPoIB latency on graphene (fixed)
* [https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=6281 #6281] Econome problem on P-State (fixed)
* [https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=6397 #6397] RefAPI CPU Mhz (fixed)
* [https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=6570 #6570] Power consumption with jessie-std (fixed)
==  Turboboost ==
[https://en.wikipedia.org/wiki/Intel_Turbo_Boost Turboboost] allows cores to run above their normal operating frequency when the CPU demand is high and as long as [http://www.intel.fr/content/www/fr/fr/architecture-and-technology/turbo-boost/turbo-boost-technology.html the electrical or thermal limits] stays below the specification limits.
=== Checking the configuration ===
The Sysfs interface depends on the P-State driver in use:
* To check the driver in use: <code>cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver</code>
* With the <code>intel pstate</code> driver: <code>cat /sys/devices/system/cpu/intel_pstate/no_turbo</code> (1 == disabled)
* With the <code>acpi-cpufreq</code> driver: <code>cat /sys/devices/system/cpu/cpufreq/boost</code> (1 == enabled)
The <code>cpupower frequency-info</code> tool also provides information about Turboboost:
<pre>  boost state support:
    Supported: yes
    Active: yes</pre>
=== Enabling/Disabling Turboboost ===
Using the Sysfs interface:
* With the <code>acpi-cpufreq</code> driver: <code>echo 0 &gt; /sys/devices/system/cpu/cpufreq/boost</code> (0 == disabled)
* With the <code>intel pstate</code> driver: <code>echo 1 &gt; /sys/devices/system/cpu/intel_pstate/no_turbo</code> (1 == disabled)
With the <code>acpi-cpufreq</code>, Turboboost can also be disabled by setting manually the maximum CPU frequency as Turboboost is represented by a CPU frequency in the <code>scaling_available_frequencies</code> list.
In any case, Turboboost can also be disabled by using [https://en.wikipedia.org/wiki/Model-specific_register MSR]. See [http://luisjdominguezp.tumblr.com/post/19610447111/disabling-turbo-boost-in-linux this page] or [http://notepad2.blogspot.fr/2014/11/a-script-to-turn-off-intel-cpu-turbo.html this page] for more information.
=== References ===
* [https://www.kernel.org/doc/Documentation/cpu-freq/boost.txt Linux Kernel documentation: Processor boosting control]

Latest revision as of 16:57, 23 August 2023

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

This page describes the CPU configuration of Grid'5000 systems and explains how to change the CPU parameters.

CPU parameters in Grid'5000 : Hyperthreading, C-State, P-State and Turboboost

This page focus on the following (mostly Intel) CPU parameters:

  • Hyperthreading (HT or SMT): each physical core can run multiple threads simultaneously so that several logical processing units are exposed to the operating system (e.g. 2 threads on many Intel CPUs and recent AMD CPUs, 8 on IBM Power8, ...).
  • C-States: Processors and cores idle states management.
  • P-States: Dynamic voltage and frequency scaling (DVFS).
  • Turboboost: allows cores to run faster than their original frequency if they are operating below power and temperature specification limits.

Grid'5000 configuration

Note that the CPU configuration depends on both the BIOS configuration and the operating system.

Default BIOS configuration on Grid'5000:

  • HT and Turboboost are enabled on every clusters installed since 2012. This corresponds to clusters with Intel CPU using Sandy Bridge, Haswell or later archi­tectures.
  • P-States and C-States (including C1E) are enabled but managed by the operating system (see below).

Grid'5000 reference images:

  • The min reference images is a standard Debian installation. Therefore (with Debian 9) the P-States governor is powersave for clusters using the intel_pstate module and ondemand for clusters using the acpi-cpufreq module.
  • Starting with base, the Grid'5000 reference images are tuned for performance. In particular, the P-States governor is 'performance' and the Linux kernel's default of allowing all C-States . This applies to nfs, big and std environments.

The following table presents the status of the platform:

Installation date Manufacturing date Site Cluster CPU Family CPU Version Microarchitecture Frequency Server type HT enabled Turboboost enabled P-State driver C-State driver
2006-07-01 2006-07-01 lyon sagittaire AMD Opteron 250 K8 2.4 GHz Sun Fire V20z Fail.png Fail.png none acpi_idle
2011-01-04 2011-01-04 sophia uvb Intel Xeon X5670 Westmere 2.95 GHz Dell PowerEdge C6100 Check.png Fail.png none intel_idle
2012-09-14 2012-07-16 lyon orion Intel Xeon E5-2630 Sandy Bridge 2.3 GHz Dell PowerEdge R720 Check.png Check.png intel_cpufreq intel_idle
2012-09-14 2012-07-16 lyon taurus Intel Xeon E5-2630 Sandy Bridge 2.3 GHz Dell PowerEdge R720 Check.png Check.png intel_cpufreq intel_idle
2012-10-02 2012-07-19 lyon hercule Intel Xeon E5-2620 Sandy Bridge 2.0 GHz Dell PowerEdge C6220 Check.png Check.png intel_cpufreq intel_idle
2013-09-10 2012-10-30 luxembourg petitprince Intel Xeon E5-2630L Sandy Bridge 2.0 GHz Dell PowerEdge M620 Check.png Check.png intel_cpufreq intel_idle
2014-04-16 2012-06-28 nantes econome Intel Xeon E5-2660 Sandy Bridge 2.2 GHz Dell PowerEdge C6220 Check.png Check.png intel_cpufreq intel_idle
2015-01-13 2015-01-13 rennes parasilo Intel Xeon E5-2630 v3 Haswell 2.4 GHz Dell PowerEdge R630 Check.png Check.png intel_cpufreq intel_idle
2016-12-01 2016-10-28 lyon nova Intel Xeon E5-2620 v4 Broadwell 2.1 GHz Dell PowerEdge R430 Check.png Check.png intel_cpufreq intel_idle
2017-06-26 2017-06-07 nancy grele Intel Xeon E5-2650 v4 Broadwell 2.2 GHz Dell PowerEdge R730 Check.png Check.png intel_cpufreq intel_idle
2017-10-16 2017-07-11 nantes ecotype Intel Xeon E5-2630L v4 Broadwell 1.8 GHz Dell PowerEdge R630 Check.png Check.png intel_cpufreq intel_idle
2018-01-16 2017-12-26 grenoble yeti Intel Xeon Gold 6130 Skylake-SP 2.1 GHz Dell PowerEdge R940 Check.png Check.png intel_pstate intel_idle
2018-03-22 2017-12-12 grenoble dahu Intel Xeon Gold 6130 Skylake-SP 2.1 GHz Dell PowerEdge C6420 Check.png Check.png intel_pstate intel_idle
2018-04-11 2018-04-01 nancy grvingt Intel Xeon Gold 6130 Skylake-SP 2.1 GHz Dell PowerEdge C6420 Check.png Check.png intel_pstate intel_idle
2018-08-01 2018-07-17 lille chifflot Intel Xeon Gold 6126 Skylake-SP 2.6 GHz Dell PowerEdge R740 Check.png Check.png intel_pstate intel_idle
2018-08-06 2018-07-27 lille chiclet AMD EPYC 7301 Zen 2.2 GHz Dell PowerEdge R7425 Check.png Check.png acpi-cpufreq acpi_idle
2019-06-07 2019-05-27 nancy graffiti Intel Xeon Silver 4110 Skylake-SP 2.1 GHz Dell PowerEdge T640 Check.png Check.png intel_pstate intel_idle
2019-09-01 2019-06-12 lyon gemini Intel Xeon E5-2698 v4 Broadwell 2.2 GHz Nvidia DGX-1 Check.png Check.png intel_cpufreq intel_idle
2019-09-04 2019-07-16 nancy gros Intel Xeon Gold 5220 Cascade Lake-SP 2.2 GHz Dell PowerEdge R640 Check.png Check.png intel_pstate intel_idle
2019-11-25 2019-11-15 nancy grue AMD EPYC 7351 Zen 2.4 GHz Dell PowerEdge R7425 Check.png Check.png acpi-cpufreq acpi_idle
2019-12-23 2019-11-21 grenoble troll Intel Xeon Gold 5218 Cascade Lake-SP 2.3 GHz Dell PowerEdge R640 Check.png Check.png intel_pstate intel_idle
2020-01-06 2019-11-07 lyon pyxis ThunderX2 99xx Vulcan 2.2 GHz Gigabyte R181-T92-00 Check.png Fail.png cppc_cpufreq none
2020-08-20 2020-07-09 nancy grappe Intel Xeon Gold 5218R Cascade Lake-SP 2.1 GHz Dell PowerEdge R640 Check.png Check.png intel_pstate intel_idle
2020-10-05 2016-10-17 grenoble drac POWER8NVL 1.0 POWER8 4.0 GHz IBM PowerNV S822LC (8335-GTB) Check.png Fail.png powernv-cpufreq powernv_idle
2021-01-13 2020-12-07 nancy grouille AMD EPYC 7452 Zen 2 2.1 GHz Dell PowerEdge R7525 Check.png Check.png acpi-cpufreq acpi_idle
2021-05-17 2021-05-17 lyon neowise AMD EPYC 7642 Zen 2 2.1 GHz AMD-Penguin Computing Check.png Check.png acpi-cpufreq acpi_idle
2021-08-26 2021-06-24 nancy gruss AMD EPYC 7352 Zen 2 2.4 GHz Dell PowerEdge R7525 Check.png Check.png acpi-cpufreq acpi_idle
2021-11-18 2021-11-24 lyon sirius AMD EPYC 7742 Zen 2 2.1 GHz NVidia DGX A100 Check.png Check.png acpi-cpufreq none
2021-12-15 2021-12-10 grenoble servan AMD EPYC 7352 Zen 2 2.3 GHz Dell PowerEdge R7525 Check.png Check.png acpi-cpufreq acpi_idle
2022-09-07 2022-06-22 nancy grat AMD EPYC 7513 Zen 3 2.1 GHz HPE Apollo 6500 Check.png Fail.png none acpi_idle
2022-12-01 2022-11-06 toulouse montcalm Intel Xeon Silver 4314 Ice Lake-SP 3.4 GHz HPE Proliant DL360 Gen10+ Check.png Check.png intel_pstate intel_idle
2023-06-05 2022-10-24 rennes abacus25 AMD EPYC 7413 Zen 3 2.65 GHz HPE ProLiant DL385 Gen10+ v2 Check.png Check.png acpi-cpufreq acpi_idle
2023-06-13 2022-12-01 toulouse estats Carmel Unknown Carmel 2.26 GHz Connecttech/Nvidia Jetson AGX Xavier Fail.png Fail.png tegra194 tegra19x_cpuidle_driver
2023-08-01 2022-12-14 strasbourg fleckenstein Intel Xeon Silver 4314 Ice Lake-SP 3.4 GHz HPE Proliant DL360 Gen10+ Check.png Check.png intel_pstate intel_idle
2023-09-06 2018-10-22 rennes roazhon2 Intel Xeon Gold 5120 Skylake-SP 2.2 GHz Dell PowerEdge R640 Check.png Check.png intel_pstate intel_idle
2023-09-06 2020-12-01 rennes roazhon4 AMD EPYC 7H12 Zen 2 2.6 GHz Dell PowerEdge R7525 Check.png Check.png acpi-cpufreq acpi_idle
2023-10-06 2019-09-25 rennes roazhon13 Intel Xeon Gold 6254 Cascade Lake-SP 3.1 GHz Dell PowerEdge C6420 Check.png Check.png intel_pstate intel_idle
2023-10-09 2017-11-13 rennes abacus4 Intel Xeon Silver 4114 Skylake 2.65 GHz Dell PowerEdge R740 Check.png Check.png intel_pstate intel_idle
2023-10-09 2017-12-15 rennes roazhon9 Intel Xeon Gold 5118 Skylake 2.3 GHz Dell PowerEdge C6420 Check.png Check.png intel_pstate intel_idle
2023-10-11 2020-11-13 rennes abacus18 AMD EPYC 7352 Zen 2 2.3 GHz Dell PowerEdge R7525 Check.png Check.png acpi-cpufreq acpi_idle
2023-10-11 2020-12-29 rennes abacus20 Intel Xeon Gold 5220R Cascade Lake-SP 2.2 GHz Dell PowerEdge T640 Check.png Check.png intel_pstate intel_idle
2023-10-13 2015-04-06 rennes roazhon5 Intel Xeon E5-2660 v3 Haswell 2.6 GHz Dell PowerEdge M630 Check.png Check.png intel_cpufreq intel_idle
2023-10-13 2017-07-05 rennes abacus3 Intel Xeon E5-2620 v4 Broadwell 2.1 GHz Dell PowerEdge T640 Check.png Check.png intel_cpufreq intel_idle
2023-10-16 2019-09-26 rennes abacus16 Intel Xeon Silver 4214 Cascade Lake-SP 2.2 GHz Dell PowerEdge C4140 Check.png Check.png intel_pstate intel_idle
2023-10-24 2012-09-14 rennes roazhon11 Intel Xeon E5-2660 Sandy Bridge 2.2 GHz Dell PowerEdge C6220 Check.png Check.png intel_cpufreq intel_idle
2023-10-24 2012-09-14 rennes roazhon12 Intel Xeon E5-2660 Sandy Bridge 2.2 GHz Dell PowerEdge C6220 Check.png Check.png intel_cpufreq intel_idle
2023-10-24 2016-07-27 rennes abacus1 Intel Xeon E5-2650 v4 Broadwell 2.2 GHz Dell PowerEdge R730 Check.png Check.png intel_cpufreq intel_idle
2023-10-24 2017-06-22 rennes roazhon8 Intel Xeon E5-2650 v4 Broadwell 2.2 GHz Dell PowerEdge C6320 Check.png Check.png intel_cpufreq intel_idle
2023-10-24 2017-10-25 rennes roazhon6 Intel Xeon E5-2650 v4 Broadwell 2.2 GHz Dell PowerEdge M630 Check.png Check.png intel_cpufreq intel_idle
2023-10-24 2018-05-06 rennes abacus5 Intel Xeon Gold 5115 Skylake-SP 2.4 GHz Dell PowerEdge R740 Check.png Check.png intel_pstate intel_idle
2023-10-24 2018-09-20 rennes roazhon1 Intel Xeon Gold 6154 Skylake-SP 3.0 GHz Dell PowerEdge R640 Check.png Check.png intel_pstate intel_idle
2023-10-24 2018-11-12 rennes roazhon3 Intel Xeon Gold 6142 Skylake-SP 2.6 GHz Dell PowerEdge R640 Check.png Check.png intel_pstate intel_idle
2023-10-24 2018-12-03 rennes abacus11 Intel Xeon Silver 4114 Skylake-SP 2.2 GHz Dell PowerEdge T640 Check.png Check.png intel_pstate intel_idle
2023-10-24 2019-09-25 rennes roazhon10 Intel Xeon Gold 5220 Cascade Lake-SP 2.2 GHz Dell PowerEdge MX740c Check.png Check.png intel_pstate intel_idle
2023-10-24 2021-08-02 rennes abacus21 AMD EPYC 7F72 Zen 2 3.2 GHz Dell PowerEdge R7525 Check.png Check.png acpi-cpufreq acpi_idle
2023-10-30 2018-12-03 rennes abacus10 Intel Xeon Silver 4114 Skylake-SP 2.2 GHz Dell PowerEdge T640 Check.png Check.png intel_pstate intel_idle
2023-11-06 2018-12-12 rennes abacus9 Intel Xeon Silver 4114 Skylake-SP 2.2 GHz Dell PowerEdge C4140 Check.png Check.png intel_pstate intel_idle
2023-11-08 2019-09-26 rennes abacus14 Intel Xeon Silver 4214 Skylake-SP 2.65 GHz Dell PowerEdge C4140 Check.png Check.png intel_pstate intel_idle
2023-11-10 2017-08-15 rennes abacus2 Intel Xeon E5-2609 v4 Broadwell 1.7 GHz Dell PowerEdge R730 Fail.png Fail.png intel_cpufreq intel_idle
2023-11-17 2018-11-09 rennes abacus8 Intel Xeon Gold 5120 Skylake-SP 2.2 GHz Dell PowerEdge R7920 Check.png Check.png intel_pstate intel_idle
2023-11-17 2019-10-28 rennes abacus12 Intel Xeon Gold 6248 Cascade Lake-SP 2.5 GHz Dell PowerEdge R740 Check.png Check.png intel_pstate intel_idle
2023-11-17 2020-11-12 rennes abacus19 Intel Xeon Gold 5220R Cascade Lake-SP 2.2 GHz Dell PowerEdge R740 Check.png Check.png intel_pstate intel_idle
2023-11-17 2020-11-25 rennes abacus17 AMD EPYC 7282 Zen 2 2.8 GHz Dell PowerEdge R7525 Check.png Check.png acpi-cpufreq acpi_idle
2023-12-04 2022-01-14 rennes abacus22 AMD EPYC 7413 Zen 3 2.65 GHz ProLiant DL385 Gen10 Plus v2 Check.png Check.png acpi-cpufreq acpi_idle
2023-12-05 2023-11-30 nancy grosminet Intel Xeon Gold 6240L Cascade Lake-SP 2.6 GHz Proliant DL560 Gen10 Check.png Check.png intel_pstate intel_idle
2023-12-20 2017-06-27 rennes roazhon7 Intel Xeon E5-2650 v4 Broadwell 2.2 GHz Dell PowerEdge C6320 Check.png Check.png intel_cpufreq intel_idle
2024-01-02 2023-05-27 rennes paradoxe Intel Xeon Gold 5320 Ice Lake-SP 2.2 GHz HPE Proliant DL360 Gen10+ Check.png Check.png intel_pstate intel_idle
2024-01-10 2015-10-23 nancy grostiti Intel Xeon E7-4850 v3 Haswell 2.2 GHz Dell PowerEdge R930 Check.png Check.png intel_cpufreq intel_idle
2024-01-22 2023-05-02 lille chuc AMD EPYC 7513 Zen 3 2.6 GHz Apollo 6500 Gen10 Plus Check.png Check.png acpi-cpufreq acpi_idle
2024-01-25 2023-05-02 lille chirop Intel Xeon Platinum 8358 Ice Lake-SP 2.6 GHz DL360 Gen10+ Check.png Check.png intel_pstate intel_idle
2024-08-23 2024-08-07 nancy gres AMD EPYC 9254 Zen 4 2.9 GHz ProLiant DL385 Gen11 Check.png Check.png acpi-cpufreq acpi_idle
2024-09-27 2024-09-02 nancy grdix AMD EPYC 9754 Zen 4c 2.25 GHz ProLiant DL365 Gen11 Check.png Check.png acpi-cpufreq acpi_idle
2024-11-08 2024-08-19 grenoble nessie Intel Xeon Gold 6430 Golden Cove 3.4 GHz HPE ProLiant DL385 Gen10+ v2 Check.png Check.png intel_pstate intel_idle

Last generated from the Grid'5000 Reference API on 2024-12-20 (commit 387e79b712)


Up-to-date information can be found by querying the Grid'5000 Reference-API:

See also Hardware for more information about Grid'5000 hardware.

Checking the configuration

  • The CPU configuration is checked automatically by g5k-checks and nodes are disabled if the configuration is wrong.
  • You can also retrieve the CPU configuration of a node by running g5k-checks manually:
Terminal.png fnancy:
oarsub -I -l nodes=1,walltime=00:30 -t deploy
Terminal.png fnancy:
kadeploy3 -f $OAR_NODE_FILE -u deploy -e debian11-x64-std -k
Terminal.png node:
g5k-checks -m api
Terminal.png node:
cat /tmp/*.yaml
  • The Execo user guide also provides an example for checking the CPU performance settings of Grid5000 clusters.

Setting CPU parameters: Hyperthreading, C-State, P-State and Turboboost

The following explains how to change the CPU parameters by using either sysfs (with root privileges) or kernel boot parameters.

Hyperthreading (HT)

On Intel Xeon, HT provides two logical cores per physical core. With HT, one physical core appears as two processors to the operating system.

Checking the configuration

  • Is the processor HT-capable?
    • cat /proc/cpuinfo | grep flags: flag list should include ‘ht’
  • Is HT enabled?
    • lscpu | grep 'Thread(s) per core' 1: HT is disable, 2: HT is enabled

Enabling/Disabling HT

  • HT must be turned on in the BIOS configuration to enable changing the HT configuration with kernel boot parameters or as root at runtime. Every cluster installed since 2012 is configured like that.
  • Note that on some clusters, the BIOS parameters might be ignored, see Bug #5229.

Setting up HT using root privilege (recommended)

  • HT can be turned ON or OFF using the cpu-hotplug capability of the linux kernel.
  • Interface: /sys/devices/system/cpu/cpu*
  • To disable HT:
    • for i in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | awk -F',' '{print $2}' | sort -u); do echo 0 > /sys/devices/system/cpu/cpu$i/online; done
    • if the processor has more than two hyperthreads per core: for i in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -d ',' -f 2- | tr , '\n'); do echo 0 > /sys/devices/system/cpu/cpu$i/online; done
  • To enable HT:
    • for i in $(ls /sys/devices/system/cpu/cpu*/online); do echo $i; echo 1 > $i; done

Note: HT must be turned ON in the BIOS configuration and the kernel boot parameters should not restrict the visibility of the logical cores (ie. no boot parameters or maxcpus+additional_cpus set to number of logical cores. See below).

Kernel boot command-line parameters (alternative)

HT settings can also be controlled using Kernel boot parameter maxcpus=n to restrict boot time cpus to n. It can be used to turn off HT: for example, using maxcpu=4 at the boot of a quad-core machine disables HT properly because each of the 4 first logical cores correspond to one physical cores. See bug #5229 for additional information.==

With Kadeploy, you can specify kernel boot parameters on the environment description file:

  • Get the description of debian11-x64-base:
kaenv3 -p debian11-x64-base -u deploy > mydebian-x64-base.env
  • Edit the mydebian-x64-base.env and add a kernel_params entry within the existing boot entry of the YAML file:
boot:
  kernel: "/vmlinuz"
  initrd: "/initrd.img"
  kernel_params: maxcpus=4
  • Deploy the environment:
oarsub -I -t deploy -l nodes=1,walltime=1
kadeploy3 -f $OAR_NODEFILE -a mydebian-x64-base.env -k
  • Note that the kernel parameters specified in the environment description file are appended to the default kernel parameters of the cluster.
  • You can check the kernel parameters with cat /proc/cmdline

References

C-States

C-States are power modes that put various processor subsystems to sleep when the CPU is idle.

The deeper C-States levels save more power but require more time to get the CPU active again. The C-States are:

  • C0: the CPU is actively running code (ie. it is the non-idle state)
  • C1: the CPU is idle but quick to wake-up
  • C2 and up: extra power saving states

There is also a C-State called C1E: C1E replaces C1 when C1E is enabled on the BIOS and OS configuration. C1E allows lower CPU’s speed and voltage.

The Dell whitepaper about Controlling Processor C-State Usage in Linux is a great resource to learn about C-States.

C-States are managed by the operating system using the cpuidle subsystem and an idle driver (either intel_idle or acpi_idle). The default driver is intel_idle on new kernel and hardware.

C-States Drivers

intel_idle driver

  • This driver does not use ACPI. It directly uses knowledge of Intel CPU hardware.
  • The C-States list of this driver might differ from the list provided by ACPI.
  • C1E can be disable via the driver (as it is view as one of the C-State in the C-State list).

acpi_idle driver

  • This driver is used when intel_idle is disabled.
  • It takes into account both the BIOS parameters and the kernel parameters.
  • Procfs interface: cat /proc/acpi/processor/CPUx/power
  • Companion tool: acpitool -c

Checking the configuration (Sysfs)

  • Driver is in use:
    • cat /sys/devices/system/cpu/cpuidle/current_driver
  • Name and Latency of C-States:
    • cat /sys/devices/system/cpu/cpu*/cpuidle/state*/name
    • cat /sys/devices/system/cpu/cpu*/cpuidle/state*/latency
  • Idle state statistics:
    • /sys/devices/system/cpu/cpu*/cpuidle/state*/usage
    • Idle state statistics can be retrieve more easily with cpupower (see below).
  • Sysfs also provides an interface to know if the C-States are disabled but as there is multiple ways to disable C-States, you cannot relies on it. The more reliable way to check if C-States are enabled or disabled is to monitor the CPU idle state statistics. Note also that C-States can be disabled independently of each other. For the record, here are the Sysfs interface to know if C-States are disabled:
    • Are C-States disabled ? cat /sys/devices/system/cpu/cpu*/cpuidle/state*/disable.
      • This is only correct if the C-States were disabled with the sysfs interface. It is not correct when the CPU latency is limited with /dev/cpu_dma_latency.
    • Max C-States allowed by the intel_idle driver: cat /sys/module/intel_idle/parameters/max_cstate.
      • It only provides the value corresponding to the intel_idle.max_cstate kernel parameter.
  • The /proc/acpi/processor/*/power interface has been removed from the kernel.

Checking the configuration (Tools)

Cpupower retrieves CPU information from the sysfs interface (apt-get install linux-cpupower):

  • cpupower idle-info works for both intel_idle and acpi_idle and provides the same information as /sys/devices/system/cpu/cpu*/cpuidle/state*/.
  • cpupower monitor -m Idle_Stats gives idle state statistics and is a reliable way to check if C-States are enabled. On the following example, C-States are fully enabled and the CPU is idle:
root@graphene-143:~# cpupower monitor -m Idle_Stats
    |Idle_Stats                        
CPU | POLL | C1-N | C1E- | C3-N | C6-N 
   0|  0.00|  0.00|  0.00|  0.00| 53.04
   1|  0.00|  0.00|  0.00|  0.00| 95.43
   2|  0.00|  0.00|  0.00|  0.04| 76.96
   3|  0.00|  0.00|  0.00|  0.00| 99.97
Idle_Stats
    Shows  statistics  of  the  cpuidle  kernel  subsystem.  Values  are  retrieved  from   /sys/devices/sys‐
    tem/cpu/cpu*/cpuidle/state*/.   The  kernel  updates  these values every time an idle state is entered or
    left. Therefore there can be some inaccuracy when cores are in an idle state for some time when the  mea‐
    sure starts or ends. In worst case it can happen that one core stayed in an idle state for the whole mea‐
    sure time and the idle state usage time as exported by the kernel did not get updated.  In  this  case  a
    state residency of 0 percent is shown while it was 100.

Extra tools you might find useful:

  • i7z: can be used to check C-state usage regardless of which idle driver is being used. It uses MSR information.
  • powertop: tool to find out what is using power
  • turbostat: report processor frequency and idle statistics
  • hwloc: detect the hierarchical topology of the hardware architectures

Enabling/Disabling C-States

Note that disabling entirely C-States (ie. only allowing C0) interferes with HT and HT should be disable when the CPU is forced to stay on the C0 state.

Dynamic Control of the C-States using root privilege (recommended)

  • C-States can be disabled on a per core and per C-State basis:
    • echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state3/disable
    • or cpupower idle-set -d 3
  • To enable back a C-State:
    • echo 0 > /sys/devices/system/cpu/cpu0/cpuidle/state3/disable
    • or cpupower idle-set -e 3
  • You can also limit the allowed C-States by using the Power management Quality of Service (PM QOS) interface. Indeed, requesting a low latency prevents the processor from entering deep sleep states.
    • The file /dev/cpu_dma_latency can be used to set a maximum allowable latency: Write a number to this file representing the maximum allowed response time in microseconds.
    • This file must be kept open as long as you want to limit the latency.
    • The latency of C-States are given by /sys/devices/system/cpu/cpu*/cpuidle/state*/latency. "0" means only allowing C0.
    • Setting a maximum latency does not update /sys/devices/system/cpu/cpu0/cpuidle/state*/disable or /sys/module/intel_idle/parameters/max_cstate.
    • More information can be found here and here.
    • Here is a ruby example for using /dev/cpu_dma_latency:
#!/usr/bin/ruby

if ARGV.empty?
  puts "Usage: sudo ./limit_ctates.rb <latency>"
  exit
end

latency = ARGV[0]

['INT', 'TERM'].each { |sig|
  Signal.trap(sig) {
    $f.close()
    exit
  }
}

$f = File.open("/dev/cpu_dma_latency", "w")
$f.syswrite(latency)
sleep

On taurus, it seems that limiting latency to 80 ms (C3) also enables C-States up to C7. However, on Graphene, limiting latency to C3 does work.

Kernel boot command-line parameters

intel_idle driver
  • This driver mostly ignores BIOS settings and kernel parameters but idle=halt automatically disable cpuidle including intel_idle, in newer kernels
  • intel_idle.max_cstate=0 disables intel_idle and fall back on acpi_idle.
  • intel_idle.max_cstate=[1-6] specifies the maximum depth of C-states.
acpi_idle driver
  • The depth of C-States can be set with processor.max_cstate=n
  • Note that when processor.max_cstate=0 is used, the kernel actually silently sets it to 1.
Other kernel parameters
  • idle=halt: allows C0-C1(E) (Halt means C1). It allows for low latency.
  • idle=poll: CPU will stay in C0 (Poll means C0). It allows for extremely low latency: The processor will stay in C0 and kept busy in a loop. It increases power usage considerably.
  • idle=mwait: has been removed from 3.x kernels.

Influence of C-States on power usage, network latency and P-States

C-States are a power saving feature of the CPU. Here are some power usage measurement performed on Taurus (bug #6570):

  • C0/POLL (0ms): 183 W
  • C1S (2 ms): 144 W
  • C1E (10ms): 113 W

C-States impact CPU wake-up latencies and have therefore an impact on network latency when the nodes are not busy (see bug #5368 for an example). You should disable C-States for maximum performances. Here are some ping latency measurements on the Infiniband interface of Graphene when nodes are idle:

  • C0 (0 ms): ping in ~ 0.030 ms
  • C1 (3 ms): ping in ~ 0.030 ms
  • C1E (10 ms): ping in ~ 0.045 ms
  • C3 (20 ms): ping in ~ 0.140 ms
  • C6 (200 ms): ping in ~ 0.200 ms

References

P-States

Modern CPU supports dynamic frequency scaling to reduce both the electric energy consumption and the heat generated by the processor. The P-States are the various frequency settings supported by the CPU. C-States and P-States are independent from each other. To understand the difference between C-States and P-States, you can read this or this.

P-States are managed by cpu-freq and kernel drivers (either intel_pstate or acpi-cpufreq). The default driver is intel_pstate on new kernels and new Intel hardwares. Each driver implements several governors, ie. policies for the CPU frequency scaling algorithms.

P-States drivers

acpi_cpufreq driver

This driver supports five governors. The governors performance, powersave and userspace set the frequency statically whereas the governors ondemand and conservative set the CPU depending on the current CPU usage (dynamic scaling).

Governor Description
performance Sets the CPU frequency to the value defined in /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
powersave Sets the CPU frequency to the value defined in /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
userspace Sets the CPU frequency to the value defined in /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed
ondemand Set the CPU frequency to scaling_max_freq when the CPU load threshold (default is 95%) is reached.
conservative Same as ondemand but increases the frequency by steps. The default threshold is 80%.

Those governors are described in the Linux kernel documentation. The governors ondemand and conservative can be finely tuned using the sysfs file accessible parameters described in the documentation (/sys/devices/system/cpu/cpufreq/<governor>/* appears when the governor is in used).

intel_pstate driver

  • This driver supports two governors: performance and powersave.
  • The performance governor is similar to the acpi_cpufreq performance governor, but the policy of the intel-pstate powersave governor depends on the CPU usage. In fact, it corresponds to the ondemand governor of the acpi_cpufreq driver.
  • The meanings of the data in /sys/devices/system/cpu/cpu*/cpufreq/ differs from the acpi_cpufreq driver. See intel-pstate.txt for more information.
  • The driver also provides its own sysfs interface within /sys/devices/system/cpu/intel_pstate/.

Checking the configuration

cpupower frequency-info provides the cpufreq kernel information in a consolidated manner. Information are gathered from the sysfs /sys/devices/system/cpu/cpu*/cpufreq/ interface. The sysfs interface is described here.

Driver and hardware capability:

  • scaling_driver is the driver in use.
  • scaling_available_governors lists the available governors for the driver in use.
  • scaling_available_frequencies lists the frequencies that are available with your CPU model (acpi-cpufreq only).
  • cpuinfo_min_freq and cpuinfo_max_freq gives the frequency range capability of the hardware.

C-States configuration:

  • scaling_governor is the current governor.
  • scaling_max_freq and scaling_min_freq: current frequency range limit used by the governor. When setting a policy you need to first set scaling_max_freq, then scaling_min_freq.
  • scaling_cur_freq: current frequency of the CPU as obtained from the hardware, in KHz. The Intel driver shows the frequency in used but acpi-cpufreq displays the requested frequency (it might change in the future). The same is true for /proc/cpuinfo.
  • scaling_cur_freq: frequency the kernel thinks the CPU runs at.
  • scaling_setspeed (write-only): is used by the userspace governor of the acpi-cpufreq driver. Irrelevant for other governors or the intel driver.

cpufrequtils are unmaintained/deprecated/dead and sys-power/cpupower should be used instead.

Setting up P-States using root privileges

The P-States configuration can be modified using cpupower frequency-set or by echoing into the following files: scaling_governor, scaling_max_freq, scaling_min_freq and scaling_setspeed. Modifying some parameters might be irrelevant for the current driver or the current governor.

$ cpupower frequency-set -g ondemand # set governor
$ cpupower frequency-set -u 2.60GHz    # set the maximum allowed frequency
$ cpupower frequency-set -d 1.80GHz    # set the minimum allowed frequency
$ cpupower frequency-info # check the configuration

On Debian system, you can make changes permanent by editing /etc/default/cpufrequtils. This is the configuration file of the cpufrequtils daemon. Note that other distribution might use another daemon (like cpupower and /etc/default/cpupower). You can also use sysfsutils and /etc/sysfs.conf for managing the configuration.

# /etc/default/cpufrequtils
governor="ondemand"
max_freq="2.60GHz"
min_freq="1.80GHz"

Intel driver specificities:

  • The scaling_governor is the same for each CPU. The last requested policy is applicable to all CPUs.
  • The scaling_max_freq and scaling_min_freq can be used to set the P-State range of the CPUs but as frequencies are converted to the nearest possible P-State available, this is subject to rounding errors. You can use instead the /sys/devices/system/cpu/intel_pstate/[min_perf_pct,max_perf_pct] parameters of Intel P-State Sysfs.

References

Related bugs

  • #5327 Low CPU frequency after deployment (fixed)
  • #5368 IPoIB latency on graphene (fixed)
  • #6281 Econome problem on P-State (fixed)
  • #6397 RefAPI CPU Mhz (fixed)
  • #6570 Power consumption with jessie-std (fixed)

Turboboost

Turboboost allows cores to run above their normal operating frequency when the CPU demand is high and as long as the electrical or thermal limits stays below the specification limits.

Checking the configuration

The Sysfs interface depends on the P-State driver in use:

  • To check the driver in use: cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
  • With the intel pstate driver: cat /sys/devices/system/cpu/intel_pstate/no_turbo (1 == disabled)
  • With the acpi-cpufreq driver: cat /sys/devices/system/cpu/cpufreq/boost (1 == enabled)

The cpupower frequency-info tool also provides information about Turboboost:

  boost state support:
    Supported: yes
    Active: yes

Enabling/Disabling Turboboost

Using the Sysfs interface:

  • With the acpi-cpufreq driver: echo 0 > /sys/devices/system/cpu/cpufreq/boost (0 == disabled)
  • With the intel pstate driver: echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo (1 == disabled)

With the acpi-cpufreq, Turboboost can also be disabled by setting manually the maximum CPU frequency as Turboboost is represented by a CPU frequency in the scaling_available_frequencies list.

In any case, Turboboost can also be disabled by using MSR. See this page or this page for more information.

References