Expensive 2-Socket Servers

When it comes to expensive 2-socket servers, AMD positioning is cunning. In the midrange we will find servers with sixteen Opteron cores (2-socket x 2-quad-core die per socket) offering 8 memory channels and 24 DIMM slots. Performance will probably be “close enough” to the Westmere EP servers , which can only offer six memory channels and 18 DIMM slots. The extra amount of memory bandwidth might make a dual Opteron 6100 attractive to the HPC folks, while the higher amounts of DIMM slots together with a competitive price may very well convince the virtualization market.

Midrange and Budget 2-Socket Servers

 
When it comes to the midrange of the 2-socket market, AMD has no choice: it must compete on price. There is no way an Opteron 4100 (“Lisbon”, socket C32) is going to be competitive with Westmere-EP at the same clockspeed. As we noted before, the former will be a few percent faster than the current six-core Opteron “Istanbul”, while the Westmere chip is at least 20% faster than it famous older brother. The fastest Lisbons are probably not even going to be able to keep up the low clocked six-core Westmere-EPs. So the Opteron 4100 and “San Marino” platform have only one mission: to be a lot cheaper than the low-end Westmere servers. To increase the performance/watt ratio, the San Marino servers do not support the 105-137W SE CPUs. This ensures that the server vendors do not have to overbuild the voltage regulators and PSU, which in turn lowers the overall power a server consumes when running with 75W ACP parts.

Ultra Low Power Server

AMD had some succes in the ultra low power market and clearly wants more. The “Adelaide” platform is the successor the power optimized “Kroner” platform. Low power memory and chipset, voltage regulators and PSUs that only support low power Opterons: every component is tuned for low power. Remarkably, the ACP of the Opteron 4100 EE is lowered to a very low 35W ACP, or less than 6W per CPU. AMD feels these CPUs offer an excellent alternative to the VIA Nano and Intel Atom based servers. Instead of running one small website on an Intel Atom based server, AMD hopes that ISPs will prefer to run 6 websites on a container based solution. So each website would get it’s own 6 Watt core which is much more powerful than the best Intel Atom CPUs.

Upgrade to Bulldozer

AMD’s C32 and G34 platforms will be upgradeable to the new Valencia and Interlagos CPUs which are both based on the Bulldozer core.
 
 
 

We will discuss this core in more detail but here are some extra tidbits we managed to find out:

• Two integer clusters share fetch and decode logic but have their own dedicated Instruction and Data cache
• Integer clusters can not be shared between threads: integer cores act like a Chip Multi Processing (CMP) CPU.
• The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space
• L1-caches are similar to Barcelona/Shanghai (64 KB 2-way? Not confirmed)
• Up to 4 modules share a L3-cache and Northbridge
• Two times 4 Bulldozer modules (2 x 8 "cores" or 16 cores) are about 60 to 80% faster than the twelve core Opteron 6100 CPU in SPECInt_rate.

With Bulldozer, AMD finally seems to have designed an aggressive integer core. Since the introduction of the Intel Woodcrest in 2006, Intel’s CPUs have been offering superior integer crunching performance per core. Since integer performance determines the performance of 90-95% of the server application out there, this is a big deal.

Conclusion

Intel has a very strong product lineup for each segment of the market: the massive octalcore Nehalem EX for the “mission-critical” high-end, the six-core Westmere-EP for the midrange and the “Lynfield” based Xeons for the low power market. But AMD doesn't roll over willingly: it breaks all market segment rules and shatters some (artificial?) boundaries. That will result in some very interesting opportunities for the server buyers in 2010.

So which products are worth watching or waiting for? The G34 Opteron 6100 will find a home in 48-core servers, and these servers should be a cheaper alternative to the 32-core Nehalem EX servers in the high-end. We are not completely convinced that performance and RAS features will be compelling enough to sway the typical Nehalem EX buyers (OLTP, ERP) towards an AMD Opteron server. That is our first impression, but we will give AMD the benefit of the doubt of course.

We are much more enthusiastic about AMD’s highend 2-socket platform. The fact that you will be able to buy a relatively cheap (compared to 4-socket solutions) 2-socket server with two quad channel octal cores is very attractive and a great strategic move by AMD. A platform with 16 cores (or 24 if you like) and 24 DIMM slots might attract quite a lot of typical 2-socket “virtualization consolidation” server buyers.

The other really compelling offer to the market might be the Adelaide platform, depending on how high the premium is that AMD wants for its EE Opterons. AMD has been asking pretty high prices for it’s lowest power Opterons, clearly targetting the "Facebooks" and "Googles" of the world. But if AMD is going after the Intel Atom server market, it may mean that it's going to offer some low power products in price ranges that are interesting to the rest of us.

Server CPUs in 2010
POST A COMMENT

34 Comments

View All Comments

  • Zool - Thursday, November 26, 2009 - link

    The desktop Phenom II X4 925 in 1000 quantities from amd site is 145 USD. The opteron 8300 series (simiral cache and die area than phenon II) lowest priced model 523 USD , highest priced model Quad-Core AMD Opteron 8393 SE is costing 2649 USD.
    The wafer cost for the 145 USD cpu is same than for the 2649 USD cpu.If the die areas are similar than the actual manufacturing(same machine usage,same workforce, etc) costs should be almost identical.
    So now they are selling the Phenom II X4 925 for 145 USD and i asume that they have some margins even on these models. So let we say 25 USD are the margins and 120 USD the costs.
    So for the Quad-Core AMD Opteron 8393 SE the margins will be 2529 USD. Now wait a moment biatch. THATS 101 TIMES more than for the almost same Phenom II X4 925. For a average Opteron they get around 50 times more money the same low end desktop. The same story for intel server cpu-s.
    No wonder they can SHIT on low cost desktop cpu-s. The whole roadmap is a mess about cores and manufacturing proces for server cpu derivates.
    Reply
  • vsary6968 - Thursday, November 26, 2009 - link

    Show me the benchmark that the Nehalem-EX beat Magny-Cours. So don't stated something that is not out yet.This is hurting other forum thread Reply
  • james775 - Tuesday, November 24, 2009 - link

    is now up and available at:

    http://www.amdzone.com/phpbb3/viewtopic.php?f=52&a...">http://www.amdzone.com/phpbb3/viewtopic...amp;star...
    Reply
  • Chlorus - Tuesday, November 24, 2009 - link

    I'm sure a website titled "AMDZone" will be objective and nonbiased. Reply
  • james775 - Tuesday, November 24, 2009 - link

    sure, its unbiased just like this article.

    http://bit.ly/8BX9UG">http://bit.ly/8BX9UG

    happy? =))
    Reply
  • james775 - Tuesday, November 24, 2009 - link

    http://bit.ly/6Id6y0">http://bit.ly/6Id6y0 Reply
  • Zool - Tuesday, November 24, 2009 - link

    Huh "The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space".
    They finaly found out that the amount of owerhead that they add to each execution core which actualy makes the real work ( something like 1/5 of the core logic die size) is not worth duplicate x times with each core.
    Maybe if they would make the pipelines much shorter and add only very basic prefetch , decode , branch prediction logic the amount of performance for the transistor budget would be quite shocking.
    I mean how much slower would be a amd thunderbird core on 4 GHz to curent single nehalem core.
    If u download this cpu test program with the results ( link : http://testcpu.webz.cz/index.htm">http://testcpu.webz.cz/index.htm ) u can compare your result with old cpus. The program is quite old but that means its quite fair too.
    A single core wolfdale 3.2 ghz Dhry=10712575
    Whet=2372478
    Mips=7160629
    Mflops=995667

    amd athlon 1100 Mhz(22mil transistors) Dhry=2220351
    Whet=692956
    Mips=2382066
    Mflops=300902
    Thats around 60% faster wolfdale on same clocks than the 22mil transistor (need to note that the L2 cache was on the cpu board :) )
    Just want to say that the several times more complex logic and die size increase gives you quite disapointing results.
    So someone out there could finaly make real low power high frequency cpu-s and dont chase cpu cores.
    Reply
  • freezervv - Wednesday, November 25, 2009 - link

    "The program is quite old but that means its quite fair too."

    "Just want to say that the several times more complex logic and die size increase gives you quite disapointing results."

    Umm, isn't that why people in the real world use efficient ISA extensions?
    Reply
  • Zool - Wednesday, November 25, 2009 - link

    "Umm, isn't that why people in the real world use efficient ISA extensions?"
    Pentium 3 had already SSE with 128bit registers. Upgrading to SSE3 wouldnt be a big deal. Intel atom supports everything up to SSSE3.
    Reply
  • Zool - Wednesday, November 25, 2009 - link

    "The program is quite old but that means its quite fair too."
    The problem is that that testing old cpu-s to curent ones is only working in old programs that have minimal external bandwith requirments or some minimal command promt tests. If u would test the amd 1100 MHz and the core duo wolfdale in for example Cinebench10 the diference would be much bigger. The amd 1100 cant keep up the 10+ times external memory bandwith in core2 duo. The situation would be same in real world aplications, with such slow external bandwith the old cpu-s are very slow but that doesnt mean the IPC is that much slower.
    I just want to say that amd and intel had several years of time to release a normal low power cpu without the insane die overhead of current cpu-s. And they did a big nothing. It could reach 70-80 percent of core performance for fraction of current die area.(the rest could be gained trough 30% frequency increase :) )
    The curent cpu designs increase IPC by 20-30 percent trough insane amount of compications and die size when they could just increase frequency by that amount with the right cheap design.
    Reply

Log in

Don't have an account? Sign up now