Power Consumption and Uncore Update: Every TR2 CPU

As part of the 2990WX and 2950X review, we published data looking at the core, package, and uncore power delivery on the Threadripper 2 processors. A lot of users were skeptical of our data, which we had expected, and on the basis of comments and discussions, we slightly changed our testing and layout to get a better understanding of what is going on.

Our previous testing involved taking per-core power measurements, and full package power measurements, under a CPU load with a set number of threads. In the first instance, our load was common Prime95, which could be set for a fixed number of threads. Unfortunately, as observed, Prime95 is not a great program on a per-thread level – it works best when all cores are maxed. As a result, we have since moved to POV-Ray in order to make for consistent testing. In most circumstances, POV-Ray seems to push the CPU harder than our Prime95 test ever did, drawing more power. As we cannot adjust the number of threads on POV-Ray, we discovered that through the command line, we can set an affinity mask and limit exactly which cores the program can use.

We go into detail about how to set an affinity mask in our 2990WX CPU0 follow-up article. You can read about it at this link.

In this case, we can select which cores get the POV-Ray threads, and the threads cannot jump to other cores. So as long as the CPU has enumerated the cores appropriately, we can control crossing die boundaries very easily.

The way we gather data is still the same – taking the internal register values for power. This is one of the simplest ways to take power, and is actually the way the system itself determines power limits for reducing clock speeds or reacting to other system stimuli. It is a rough estimate, somewhere within 10% of the real value according to power experts, and may not be as accurate as physical testing, however it has the advantage of giving more data, and in a way that is easily accessible without additional hardware or set up time. With the method we use, we accept those risks and the possibility of variance from real values, however we have confidence that as the CPU uses these values for its own internal metrics, they should also be good enough for us as well.

Another change is in our definitions. Previously we separated the package power into ‘Core’ and ‘Uncore’ (or interconnect, such as Infinity Fabric). AMD’s internal registers actually give the values for total package and per-core power, meaning that the difference between the two values is not only the uncore, but also the IO and the DRAM controllers too. Some Intel chips actually partition out the DRAM and Uncore, however AMD does not, so we have to accept that the number we get is an amalgamation of these values. However, based on the test, the IO and DRAM should be a constant value. But for the sake of clarity, instead of labelling the value ‘Uncore’, it is now labelled ‘non-Core’.

For this update, we re-ran every Threadripper 2 processor with the new test to find the data.

Starting at the Top: 32-Core Threadripper 2990WX

The key stats on the 2990WX is the 32 core design, with a 250W TDP. From our POV-Ray testing, we hit that 250W around 20 cores loaded.

It is worth noting that we see a couple of bumps around 10 cores and 18 cores. If we split out the per-core values, we can see that actually the per-core power goes up for these settings.

The easiest difference to see is at the 16C level, where each core gets on average 9W and then goes up to 10W, despite more load being added. This is because we are crossing die boundaries, and as new threads are being enabled further away from where the others are, it allows more power to be pushed through without disrupting temperatures too much.

The 24-Core Setup: Threadripper 2970WX

Moving down to 24-cores, each of the four silicon dies now has six cores rather than eight, which means we cross those die boundaries earlier. This has benefits and issues worth mentioning.

Now this graph is not as linear as I was expecting. We see big rises at 7-8 core load, 13-14 core load, and 19-20 core load. The reason is the same as the 2990WX: we are crossing die boundaries when more threads are added.

In the per-core charts, this is much easier to see. At the first boundary at 7-8 core load, we go from 12W per core to 15W per core. At the second boundary at 13-14 core load, we go from 10W per core to 12-13W per core. The downside of this is that going past this initial two-core high actually causes a drop in power consumption in order to counteract placing more threads on the same die.

In both the 2970WX and 2990WX, the ‘non-core’ power was around 53-55W of the total package power, even at low loads. This is roughly 20-22% of the total package power. Keep that number for later.

The Previous Recommendation: 16-Core Threadripper 2950X

With only one silicon boundary, we should expect the 2950X to be a reasonably sedate affair. This chip has a maximum TDP of 180W, and fewer IF links to draw power from the total package.

The 2950X power curve under POV-Ray is relatively smooth, and we top out at that 180W TDP value at around 10 cores loaded.

There are no surprises on the per-core values, except that in 1-2 core mode we actually push 21W through those Zen+ cores, compared to 6W per core when the 2990WX is fully loaded. That’s a large range in which the core has to be both efficient and performant.

The 12-Core Backup: Threadripper 2920X

A few years ago, offering a 12-core CPU on a consumer platform would have been a dream come true. It seems odd these days that the lowest CPU in this test is a 12-core. But here we are.

Similar to the 2950X, the power curve is smooth and regular, and we again hit the 180W TDP around 9-10 core load. This time we really need to use all the chip to get to the top power levels.

What is interesting about the per-core values for the 2920X is that it isn’t until 8C fully loaded that we actually see a proper per-core power drop – at one core load, that core is drawing 18W, and at 7C load, the cores are still drawing 17-18W each. That’s a good environment for keeping frequencies high, as I’m sure we’ll see in some of the benchmarks.

Going back to the non-core power, on these last two chips, we see the non-core power around 30W, which is only 16-17% of the total CPU power when fully loaded. This is just over half the power of the quad-die processors, mainly because of fewer Infinity Fabric links.

Peak Power Comparison

So for our peak power comparison tests, we take the peak full package power value from our per-core testing. Again we are using POV-Ray as our load generator.

Power (Package), Full Load

All the TR2 processors stay within TDP in our testing, but this is down to how AMD defines its TDP.

Building out the HEDT Platform Test Bed and Setup
POST A COMMENT

70 Comments

View All Comments

  • schujj07 - Monday, October 29, 2018 - link

    You would be far to limited with RAM to run 60 VMs on that system. I've got 80 on dual Dell 7425's with dual 24 Core Epycs and 512GB RAM and I'm already getting RAM limited.
    Again I wouldn't install ESXi on these. Use Win 10 and Workstation for your test/dev and you will have a more agile system. If you don't need it for testing that day you still have Windows. FYI I'm VMware Admin.
    Reply
  • Ratman6161 - Monday, October 29, 2018 - link

    All depends...in my home lab environment (which lets me test things at will and do whatever I want as opposed to at work where even the lab is more locked down) . For me, the Threadrippers would be great...but extreme overkill. I actually use old FX8320's which I bought when they were dirt cheap and DDR3 RAM was cheap too. The free version of ESXi works fine for me too. For my purposes the threadrippers would be really cool but more expensive than they would be worth. Reply
  • Icehawk - Monday, October 29, 2018 - link

    I would love one of these high cores boxes for our test lab, using W10 and VM on my desktop is very limiting for me (work rig is 7700 & 32gb) - one of these would let me put plenty of resources onboard. Currently my lab runs off a G6 Dell server which is totally fine but if I could get myself a new, personal, lab I'd want a TR rig since it can host a lot more RAM than Intel's option. Reply
  • odrade - Tuesday, October 30, 2018 - link

    Hi I completely agree with you.

    With security enhancement moving to sandbox/VM (Application Guard, Sandboxed Defender in 19H1) virtualization scenario will be more prevalent beyond developper or test scenarios.

    One major disappointment is that after 12+ months since GA there is no support for nested virtualization for TR/TR2 ?, Ryzen ? Epyc ?.

    This issue seems to be general and not limited to hyper-v (KVM, etc..).

    This is strange since EPYC made is way through Azure or Oracle Cloud catalog.
    During Ignite 2018 there was a demo with an EPYC box (VM or Server).

    Regards G.
    Reply
  • GreenReaper - Wednesday, October 31, 2018 - link

    You could ask for HyperV over here:
    https://windowsserver.uservoice.com/forums/295047-...

    But such features are often buggy in their initial implementations:
    http://www.os2museum.com/wp/vme-broken-on-amd-ryze...
    https://www.reddit.com/r/Amd/comments/8ljgph/has_t...

    It wouldn't surprise me if they ran into too many problems to want to push out a solution. And Intel has had issues here too - most recently L1 Terminal Fault relating to EPT:
    https://www.redhat.com/en/blog/understanding-l1-te...

    If people buy enough of them, and there is a performance benefit or it otherwise becomes a feature differentiator, support will doubtless be developed. Chicken and egg, I know.
    Reply
  • odrade - Monday, November 5, 2018 - link

    Hi,
    Thanks four your inputs.
    This feature is handy if you want to build advanced lab scenarios while preserving your work environment or avoid the hassle to use dual boot.
    Maybe this feature will be enabled with the 2019 Epyc / TR iteration.

    And if the the socket and compatibility promises is kept by AMD refreshing
    my setup will do it and put those extra pcie lanes to use (upgrading storage as well).
    At least the 7mm process will help to kept the power compatibility in line.

    Regards G.
    Reply
  • Blindsay - Monday, October 29, 2018 - link

    For the chart on the last page, the "12-core Battle" it would be interesting to see a "similar price battle" of like the 9900k vs 7820X vs 2920X. I suspect the 9900k would hold up rather well especially once it returns to its SRP Reply
  • mapesdhs - Monday, October 29, 2018 - link

    A battle for what? If it's gaming, get the far cheaper 2700X and using the difference to buy a better GPU, giving better gaming results by default (some niche cases at 1080p, but in general the 9900K is a poor value option for gaming, except for those who've gone the NPC route into high refresh displays from which there's no way back, ironic now NVIDIA has decided to move backwards to sub-60Hz 1080p with RTX). Reply
  • Blindsay - Monday, October 29, 2018 - link

    Definitely not for gaming lol. It is for a home server (unraid) Reply
  • PeachNCream - Tuesday, October 30, 2018 - link

    That's a lot of compute for a home server. Home servers (outside of those used for the development of professional skills or to test software outside of a setting where there are office usage policies) serve very limited useful purposes. They're mainly a solution looking for a problem or just fun to mess around with. I have an old C2D E8400-powered desktop PC with 8GB of RAM that I just recently put online as a local file, media, and internal web server connected via a cheap TPLink PCI (non-e) wifi card. There's nothing that the kids and I have done to it yet that brings it anywhere close to its knees. Even streaming videos from it to three other systems at once is a non-issue and all of those files are stored on a single 1TB 5400 RPM 2.5 inch mechanical HDD. TR is extreme overkill for a toy server at home. Literally any old scavenged desktop or laptop can act as a home server. Reply

Log in

Don't have an account? Sign up now