Server CPUs in 2010

AMD’s best core in 2010 is a slightly improved revision of the current six-core Opteron “Istanbul” with the following additions:

• Finally a “real” C1E state which reduces power for each core that is idleing
• Support for DDR-3

In theory, DDR-3 1333 offers 66% higher bandwidth, but in practice the Stream benchmark does not measure more than a 25% boost in bandwidth. The latency of going off-die is about the same. That means that the performance increase in most server applications will not be tangible. Only the most bandwidth intensive HPC applications will get a boost of 10 to 20%.

Currently, AMD's six-core Opteron can match the performance of Intel’s quadcore Xeon 5500 at the same clockspeed in some important server applications: OLAP databases, virtualization and web applications. Intel’s best Xeon wins with a significant margin in OLTP, ERP and rendering. A large part of the HPC market is a lost cause: a quadcore Intel Xeon 5570 at 2.93 GHz is about twice as fast as a AMD Opteron 2389 at 2.9 GHz. The fact that we could not find any Opteron 2435 results in LS-Dyna is another indication of what to expect: the 10-20% higher performance in HPC applications will not be a large step forward.

Intel is going to increase performance by 20-30% per CPU (50% more cores), while AMD’s CPUs will see only marginal increases. So basically, Intel’s performance advantage is going to grow by 20 to 30%, except in HPC workloads where it is already running circles around the competition. Not an enviable position to be in for AMD.

Suppose that you are the strategic brain behind AMD. The competition offers better “per chip” and “per core” performance. The last thing you want to do is to offer the same kind of server platform. If a six-core Opteron (“Lisbon") goes head to head with a six-core Xeon (“westmere EP”), it will not be pretty: the Intel chip will beat the AMD chip in performance and performance/watt (remember, westmere EP is a 32 nm CPU). Despite this, AMD found some clever ways to make their server platforms interesting…

Cheaper 4-Socket Servers

 

“Know your enemies and know yourself”.

In which usage scenario’s are Intel’s offerings less compelling? The Nehalem-EX is a powerful platform, but it is also a completely different one than the “Westmere EP” platform. The Nehalem-EX's most important market is the 4-socket/8-socket x86 market, where about 400,000 servers are sold per year, or about 5% of the total x86 server market. It is also a pretty complex platform with two I/O hubs and 16 (!) memory buffers chips on a 4-socket board. The Nehalem EX platform does not only want to conquer the high end 4 and 8-socket x86 server market, it also wants to convince the more paranoid RISC and Itanium buyers:

 
 
 
AMD uses the same building blocks for it’s midrange 4-socket platform as it does for the high-end 2-socket platform and calls it the G34 infrastructure. The consequence is that the RAS features stay the same, and as a result, AMD can not completely compete with the Nehalem EX platform when it comes to RAS. But that is not really a problem, as some of the "high-end" RAS features aren't used by 98% of the x86 crowd who buy the more expensive 2-socket and 4-socket servers. To compete with the 8 core/16 thread Nehalem EX, AMD puts two DDR3 Istanbuls together, which communicate via a hypertransport link and calls it a twelve core Opteron 6100 (Socket G34). A server based on the Opteron 6100 can probably come close to the performance of the lower-end and midrange Nehalem EX, but it is a lot cheaper to design and produce. The disadvantage is that it only has 12 DIMM slots per CPU, while the Nehalem EX has 16 DIMM slots per CPU.

Our first impression is that AMD will find it hard to win the high end database and ERP market. The quadcore Nehalem 5500 already outperforms the six-core Opteron “Istanbul” by a large margin (30-50%). The Opteron 6100 also has 50% more cores, but it is likely that a “native octalcore” will scale a bit better than a two times 6-core design. For the virtualization market, the higher amount of DIMM slots are an advantage for the Nehalem EX. At first sight, it looks like it will be pretty tough for AMD to regain market share in this part of the server market.
Index 2 Socket Servers, Ultra Low Power, Bulldozer & Conclusion
POST A COMMENT

34 Comments

View All Comments

  • Zool - Thursday, November 26, 2009 - link

    The desktop Phenom II X4 925 in 1000 quantities from amd site is 145 USD. The opteron 8300 series (simiral cache and die area than phenon II) lowest priced model 523 USD , highest priced model Quad-Core AMD Opteron 8393 SE is costing 2649 USD.
    The wafer cost for the 145 USD cpu is same than for the 2649 USD cpu.If the die areas are similar than the actual manufacturing(same machine usage,same workforce, etc) costs should be almost identical.
    So now they are selling the Phenom II X4 925 for 145 USD and i asume that they have some margins even on these models. So let we say 25 USD are the margins and 120 USD the costs.
    So for the Quad-Core AMD Opteron 8393 SE the margins will be 2529 USD. Now wait a moment biatch. THATS 101 TIMES more than for the almost same Phenom II X4 925. For a average Opteron they get around 50 times more money the same low end desktop. The same story for intel server cpu-s.
    No wonder they can SHIT on low cost desktop cpu-s. The whole roadmap is a mess about cores and manufacturing proces for server cpu derivates.
    Reply
  • vsary6968 - Thursday, November 26, 2009 - link

    Show me the benchmark that the Nehalem-EX beat Magny-Cours. So don't stated something that is not out yet.This is hurting other forum thread Reply
  • james775 - Tuesday, November 24, 2009 - link

    is now up and available at:

    http://www.amdzone.com/phpbb3/viewtopic.php?f=52&a...">http://www.amdzone.com/phpbb3/viewtopic...amp;star...
    Reply
  • Chlorus - Tuesday, November 24, 2009 - link

    I'm sure a website titled "AMDZone" will be objective and nonbiased. Reply
  • james775 - Tuesday, November 24, 2009 - link

    sure, its unbiased just like this article.

    http://bit.ly/8BX9UG">http://bit.ly/8BX9UG

    happy? =))
    Reply
  • james775 - Tuesday, November 24, 2009 - link

    http://bit.ly/6Id6y0">http://bit.ly/6Id6y0 Reply
  • Zool - Tuesday, November 24, 2009 - link

    Huh "The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space".
    They finaly found out that the amount of owerhead that they add to each execution core which actualy makes the real work ( something like 1/5 of the core logic die size) is not worth duplicate x times with each core.
    Maybe if they would make the pipelines much shorter and add only very basic prefetch , decode , branch prediction logic the amount of performance for the transistor budget would be quite shocking.
    I mean how much slower would be a amd thunderbird core on 4 GHz to curent single nehalem core.
    If u download this cpu test program with the results ( link : http://testcpu.webz.cz/index.htm">http://testcpu.webz.cz/index.htm ) u can compare your result with old cpus. The program is quite old but that means its quite fair too.
    A single core wolfdale 3.2 ghz Dhry=10712575
    Whet=2372478
    Mips=7160629
    Mflops=995667

    amd athlon 1100 Mhz(22mil transistors) Dhry=2220351
    Whet=692956
    Mips=2382066
    Mflops=300902
    Thats around 60% faster wolfdale on same clocks than the 22mil transistor (need to note that the L2 cache was on the cpu board :) )
    Just want to say that the several times more complex logic and die size increase gives you quite disapointing results.
    So someone out there could finaly make real low power high frequency cpu-s and dont chase cpu cores.
    Reply
  • freezervv - Wednesday, November 25, 2009 - link

    "The program is quite old but that means its quite fair too."

    "Just want to say that the several times more complex logic and die size increase gives you quite disapointing results."

    Umm, isn't that why people in the real world use efficient ISA extensions?
    Reply
  • Zool - Wednesday, November 25, 2009 - link

    "Umm, isn't that why people in the real world use efficient ISA extensions?"
    Pentium 3 had already SSE with 128bit registers. Upgrading to SSE3 wouldnt be a big deal. Intel atom supports everything up to SSSE3.
    Reply
  • Zool - Wednesday, November 25, 2009 - link

    "The program is quite old but that means its quite fair too."
    The problem is that that testing old cpu-s to curent ones is only working in old programs that have minimal external bandwith requirments or some minimal command promt tests. If u would test the amd 1100 MHz and the core duo wolfdale in for example Cinebench10 the diference would be much bigger. The amd 1100 cant keep up the 10+ times external memory bandwith in core2 duo. The situation would be same in real world aplications, with such slow external bandwith the old cpu-s are very slow but that doesnt mean the IPC is that much slower.
    I just want to say that amd and intel had several years of time to release a normal low power cpu without the insane die overhead of current cpu-s. And they did a big nothing. It could reach 70-80 percent of core performance for fraction of current die area.(the rest could be gained trough 30% frequency increase :) )
    The curent cpu designs increase IPC by 20-30 percent trough insane amount of compications and die size when they could just increase frequency by that amount with the right cheap design.
    Reply

Log in

Don't have an account? Sign up now