Dynamic Power Management: A Quantitative Approachby Johan De Gelas on January 18, 2010 2:00 AM EST
- Posted in
- IT Computing
How Much Power?
All this hardcore testing just made us more curious. Would we be able to determine how much power the PCU of Nehalem actually saves? Let's add a little more machine code to our hardware C-state scripts. The MSR 3FCh contains the info we need. We test once again with two active chess threads.
|PCU Sleep State Comparison|
|Clockticks||Ticks spent in C3||Ticks spent in C6||Percentage C3||Percentage C6|
At first you may think that these measurements contradict our previous measurements even though they were measured in the same circumstances (two active threads + one measurement thread). But if you calculate how much time the cores spend on average in C6, you get 19%, in the same ballpark as our previous measurement (21%). Notice that the PCU forces the Xeon cores to move quickly from C3 to a deeper C6 sleep: only 3% (!) is spent in C3.
So this means that the ACPI C2 state consists of 13.85% C3 and 86.15% C6 (18.76/ (3.02 + 18.76). Let's take the ACPI readings again.
|ACPI C-State Comparison|
So now we can calculate how much time the CPU actually spent in the real hardware C-states.
% time spent in C1 = 7% of 81% idle
The "software" ACPI C2 states are mapped by the Xeon CPU to two "hardware CPU" states:
- % time spent in C3 = 13.85% out of 93% C3, at 81% idle = +/- 10.3%
- % time spent in C6 = 86.15% out of 93% C3, at 81% idle = +/- 65%
So our two threads of Chess caused the L3426 cores to spend:
- 19% in C0
- 5.7% in C1
- 10.3% in C3
- 65% (!) in C6
What effect would this have on the power consumption of the chip? Intel gives us a good idea of what each C-state consumes with the Xeon X3400 series. In the thermal specifications and design guidelines  we find this table.
Intel does not give us C1 power, but let's assume it is 25W on the L3426; our industry sources tell us this should be close enough. If the complex circuitry of the PCU was not available, the CPU would be limited to the C1 state to save power. Other C-states would only be available if all cores were idle or the system was idle. We assume that C0 consumes 45W, which is not far from the truth either as the CPUs with low TDP tend to be quicker.
Total power w/o PCU
= 45W * 19% (C0) + 25W * 81% (C1)
Total Power with PCU
= 45W * 19% (C0) + 25W * 5.7% (C1) + 17W * 10.3% (C3) + 4W * 65% (C6)
The actual absolute numbers are not that important, but our simplified calculation shows that the fact that the PCU forces the CPU to go very quickly to C6 allows the "Lynnfield" Xeon to morph from a rather mediocre low power CPU into a "real" low power CPU. 14W for four complex out of order processors is very impressive, less than 4W per core! Intel's claims are justified: the PCU enables the "Nehalem" based cores to run in a deep sleep C6 state, even if other cores are hard at work. To end with an interesting note: even with four threads active on the Xeon L3426 we found out that the cores spent 11% of the time in C6.