How Much Power?

All this hardcore testing just made us more curious. Would we be able to determine how much power the PCU of Nehalem actually saves? Let's add a little more machine code to our hardware C-state scripts. The MSR 3FCh contains the info we need. We test once again with two active chess threads.

 PCU Sleep State Comparison Clockticks Ticks spent in C3 Ticks spent in C6 Percentage C3 Percentage C6 Core 1 2961889630 3497984 71450624 0.12% 2.41% Core 2 2989850634 4128768 768581632 0.14% 25.71% Core 3 3022277437 186195968 1032536064 6.16% 34.16% Core 4 3033988899 171286528 387645440 5.65% 12.78% Average 3.02% 18.76%

At first you may think that these measurements contradict our previous measurements even though they were measured in the same circumstances (two active threads + one measurement thread). But if you calculate how much time the cores spend on average in C6, you get 19%, in the same ballpark as our previous measurement (21%). Notice that the PCU forces the Xeon cores to move quickly from C3 to a deeper C6 sleep: only 3% (!) is spent in C3.

So this means that the ACPI C2 state consists of 13.85% C3 and 86.15% C6 (18.76/ (3.02 + 18.76). Let's take the ACPI readings again.

 ACPI C-State Comparison % idle C1 C2 C3 Opteron 2435 86 100 0 0 Xeon L3426 81 7 93 0 Opteron 2389 72.44 100 0 0

So now we can calculate how much time the CPU actually spent in the real hardware C-states.

% time spent in C1 = 7% of 81% idle

The "software" ACPI C2 states are mapped by the Xeon CPU to two "hardware CPU" states:

1. % time spent in C3 = 13.85% out of 93% C3, at 81% idle = +/- 10.3%
2. % time spent in C6 = 86.15% out of 93% C3, at 81% idle = +/- 65%

So our two threads of Chess caused the L3426 cores to spend:

• 19% in C0
• 5.7% in C1
• 10.3% in C3
• 65% (!) in C6

…on average.

What effect would this have on the power consumption of the chip? Intel gives us a good idea of what each C-state consumes with the Xeon X3400 series. In the thermal specifications and design guidelines [6] we find this table.

Intel does not give us C1 power, but let's assume it is 25W on the L3426; our industry sources tell us this should be close enough. If the complex circuitry of the PCU was not available, the CPU would be limited to the C1 state to save power. Other C-states would only be available if all cores were idle or the system was idle. We assume that C0 consumes 45W, which is not far from the truth either as the CPUs with low TDP tend to be quicker.

Total power w/o PCU
= 45W * 19% (C0) + 25W * 81% (C1)
= 28.8W
Total Power with PCU
= 45W * 19% (C0) + 25W * 5.7% (C1) + 17W * 10.3% (C3) + 4W * 65% (C6)
= 14.5W

The actual absolute numbers are not that important, but our simplified calculation shows that the fact that the PCU forces the CPU to go very quickly to C6 allows the "Lynnfield" Xeon to morph from a rather mediocre low power CPU into a "real" low power CPU. 14W for four complex out of order processors is very impressive, less than 4W per core! Intel's claims are justified: the PCU enables the "Nehalem" based cores to run in a deep sleep C6 state, even if other cores are hard at work. To end with an interesting note: even with four threads active on the Xeon L3426 we found out that the cores spent 11% of the time in C6.

Analysis: What Happened? More Performance Please!

• #### UrQuan3 - Thursday, January 21, 2010 - link

I'm trying to remember for 2008, but wasn't there a way to either force or suggest thread/core affinity? It looks like the scheduler was hopping all over the place on the Opterons.
• #### JarredWalton - Thursday, January 21, 2010 - link

You guys better pay attention and answer this post, or his species will try to enslave and/or wipe out the entire galaxy! ;-)
• #### mino - Wednesday, January 20, 2010 - link

They are fine examples of low-power platforms, even if from vastly different markets.

But,
WHY ON EARTH DO YOU KEEP TALKING LIKE THEY WERE COMPARABLE THROUGHOUT THE ARTICLE ???
• #### IntelUser2000 - Wednesday, January 20, 2010 - link

By the way, I don't know if you have the settings wrong or that's how it works, the Turbo Boost mode is not affected on the Home PC versions of Windows. Balanced uses Turbo Boost just as well on my Windows 7 Home Premium with Core i5 661.

• #### JarredWalton - Wednesday, January 20, 2010 - link

I was wondering this as well, but I'm not familiar with Windows Server... what I do know is that Power Saver on consumer Windows OSes really limits the CPU frequency scaling features, and it sort of looks like Balanced on the Server OS has aspects of consumer "Power Saver" as well as some elements of "Balanced". Odd to see only two power settings available, where Win7 now has at least 3 and often 5.
• #### mino - Wednesday, January 20, 2010 - link

It seems a classic example of KISS strategy of choosing the most-sensible options and so reducing decision complexity for IT people.

Modes like "Max battery" have anyway no reason for existence on a server box.
• #### RobinBee - Tuesday, January 19, 2010 - link

If you use your pc as a music server:

Power saving methods ruin sound quality even if using a good sound card. The problem is »electronic« sound distortion. I do not know why this happens.

Also: The chosen number of IRQ pr. second in a net card can ruin sound quality too. Why, I do not know.
• #### Anato - Tuesday, January 19, 2010 - link

I'm interested to see results from different operating systems which may be better at controlling processes in different CPU's. Namely no CPU hopping and is their power management as efficient as Windows is.

Most interested at:
Linux and Solaris
• #### JohanAnandtech - Tuesday, January 19, 2010 - link

Excellent suggestion :-). Problem is to keep the application the same. We currently tested SQL Server 2008 on Windows 2008 and of course this can not be done on Linux. However, I am not stranger to linux as a server.

I am no fan of MySQL on Windows, but maybe this has improved. Would MySQL on Windows and Linux makes sense as a comparison?
• #### maveric7911 - Tuesday, January 19, 2010 - link

Why not use oracle ;)