We were quite amazed, even slightly suspicious, when HP and Fujitsu-Siemens Published their SAP numbers. These numbers showed that the newest Xeon X5570 (Nehalem EP) series offer an enormous performance boost over the Xeon X5470 (Harpertown). After all, an almost 100% improvement at a slightly lower speed (2.93 GHz vs 3.3 GHz) is nothing short of amazing. Turns out that the real clockspeed is 3.2 GHz (2.93 GHz + 266 MHz turbo) but that does not alter the fact that these are truly incredible performance numbers.

I can now confirm that there are no tricks behind these numbers: they paint the right picture about the Xeon Nehalem EP. Talking to SAP benchmarking specialists, it became clear that few tuning tricks exist that are not know to the big OEM. The benchmark has been analyzed and tuned so well, that even the use of a different database (for example MS SQL instead of DB2) only makes a 2 to 3% difference most of the time. So you might even compare SAP numbers which are obtained on different databases. To resume, the SAP numbers can only be really boosted by better hardware (CPU-memory).
 
Now why I am talking so much about SAP benchmarking numbers? It is not like the expensive ERP software is run by everyone.
 
Well, the SAP numbers are showing a dual 2.93 GHz (or 3.2 GHz) Xeon beating the only quad AMD 8384 (Shanghai at 2.7 GHz) score of 22000 we have so far. Granted, a blade server is most of the time a bit slower. But four AMD 8384 2.7 GHz will be in the same league as a dual Xeon X5570, which will be out very soon now.
 
Even worse for AMD is that the SAP benchmark is not some exotic exceptional benchmarking case for the Xeon 55xx series. It shall be no surprise that the HPC numbers will be very impressive too.So it looks like AMD is in a tough spot.
 
What happened? 
As the SAP threads are sharing a lot of data (as is typical for these kind of database driven applications), hyperthreading can not be the only explanation why Nehalem is simply doubling performance and annihilating the competition. SAP benchmarking specialists expect hyperthreading to be good for about one third of the performance boost. We tend to believe these people who performed this benchmark for years now. The reason why it is not one of the "top cases" for hyperthreading on Nehalem is that this OLTP based benchmark spends a lot of time on shared data. Our own Nehalem OLTP benchmarking (Oracle and MySQL) points also in that direction.
 
As we have pointed out before the benchmark also
  • responds very well to low latency cache and memory latency
  • does not care too much about memory bandwith
  • and is very sensitive to "syncing latency".
Since the AMD Shanghai CPU has the same fast way to sync between cores (via the L3-cache) as Nehalem, it can not explain why AMD falls behind. Another explanation is of course that these benchmarks are run on a CPU which uses turbo, which explains about a 5% advantage as the Nehalem CPU actually runs at 3.2 GHz. 
 
Nehalem has faster access to the memory than AMD's latest quadcore (70 ns vs 110 ns), which is probably the second reason why Shanghai falls behind. But AMD will probably have to redesign it's integer execution pipeline significantly before it will catch up with Nehalem (think memory disambiguation for example). Basically, AMD's better NUMA - integrated memory controller platform was hiding this disadvantage. Now that the new Intel platform does not put "the brakes" on the integer execution engine anymore, the superiority of Intel's integer engine is showing.
 
The lack of any form of multi-threading is hurting AMD badly. It is well known that most of these business applications achieve very low IPC (0.2-0.6) and that modern superscalar CPUs have ample execution resources for running two threads in these applications. The results is Simultaneous Multi Threading offers a typical 20 tot 40% performance advantage. And that is huge, considering that you need 25 to 50% more clockspeed to counter that. It is basically a mission impossible for a modern CPU without SMT to outperform a similar superscalar CPU with SMT in OLTP, Java, webserver, rendering and ERP workloads. AMD really dropped the ball there, SMT should have been part of the K10 architecture.
 
Difficult times ahead for AMD
Even if AMD is able to speed up beyond 3 GHz, chances are slim that AMD will be able to compete with the new Nehalem Xeons. Add Turbo mode, hyperthreading, a lower latency memory controller and a better integer core together and you get a performance gap the size of the "Grand Canyon".
 
So does AMD have any chance at all beyond a new architecture in 2011? Is it over and out for AMD in 2009 and 2010? Adding 2 cores at the end of 2009 is a good step in the right direction. But even if AMD executes flawlessly  the 32 nm Xeon Westmere will only give a window of a few months to the AMD hexacore "Istanbul".  Istanbul should appear at the end of 2009, the Westmere Xeon is scheduled for very early 2010.
 
Westmere has few performance optimizations, it seems to be a pretty straight forward shrink. Slightly higher clockspeeds, about 20% lower power consumption, and yet another addition to the ridiculously long list of SSE-instructions in the form of seven new instructions (six instructions are for crypto/AES acceleration). Westmere is only an evolutionary step forward, but the "Grand Canyon" gap that Nehalem EP has made is probably large enough.

 

It is sure that we'll see better (lower) virtualization switching from virtual machine to hypervisor time and some small tweaks in AMD's Istanbul CPU, but it remains unclear if there are any significant performance boosters in the core. So it looks like Intel will own the dual socket space throughout 2009 and 2010, if we may believe the current roadmaps.
 
As the SAP numbers indicate,  even the slowest Intel Xeons will show a large performance gap with the best AMD Opteron's. Is AMD doomed completely? In a large part of the market, yes. AMD's istanbul will make the gap a bit smaller but probably not small enough. 
 
There are some unknown factors that together with one of the few remaining weaknesses (or rather less strong points) of Nehalem that might make it possible that AMD's opteron comes close enough in a particular area of the market. In my next post, I will clarify the one and only opportunity that I see for AMD in the next two years.  Until then, don't shoot the messenger :-).
POST A COMMENT

35 Comments

View All Comments

  • stonedsurd - Thursday, February 19, 2009 - link

    Enough with the AT-is-Intel-sponsored crap, please.

    Most of us (with brains in our skulls) have come to appreciate their objectivity, and most of us (over the age of FOUR) do fondly remember the days when AMD was actually competitive and AT reported just that.
    Reply
  • hha - Tuesday, February 17, 2009 - link

    On one hand, the article says that hardware threading is not the major factor of Intel's advantage over AMD.

    But, 4 paragraph later, the article puts a very strong word on AMD's lack of hardware threading killing its performance.

    Which one is the stand now?
    Reply
  • hha - Tuesday, February 17, 2009 - link

    Another possible factor why AMD Shanghai is losing, is its smaller L3 cache (-25% than Nehalem).

    If SAP loves low-latency memory but doesn't care about memory bandwidth, isn't this an indication of SAP's poor locality? And Nehalem's L3 cache is 33% larger than Shanghai's.

    Agree that the low CPI means a massive CMT processor is a good candidate. Though for SAP, the article somewhat invalidates this due to extensive data sharing across threads.
    Reply
  • Wozar - Monday, February 16, 2009 - link

    Or is it just me?

    The real story here isn't between Intel and AMD - no one really cares about that battle. The real story here is that to install a large SAP instance in the past would cost hundred of thousands (sometimes millions) of $ in hardware - predominantly IBM power PC or HP/sun Unix platforms. (or for the crazy people Itanium rack based toasters)

    The sensational part of this story is that a medium sized SAP instance can now be run on under $100k of hardware.

    25000 SAPS per box is astonishing - It wasn't long ago that 25000 SAPS would have cost $300k to provision - Now it is going to be available to a single WINTEL box (HP DL380 G6 or something similar).

    Flame away,

    Wozar
    Reply
  • mobilecomputing - Monday, February 16, 2009 - link

    I love the Intel roadmaps and presentations in general. Almost as slick as Apple. I think if Intel went into making devices for their chips they'd probably do very well.

    http://news.idealo.co.uk/voucher-codes/">http://news.idealo.co.uk/voucher-codes/
    Reply
  • Lucentmoon - Saturday, February 14, 2009 - link

    Why is the author having to defend his article??
    Sure its a little one-sided, but hey, thats life sometimes.

    From a business aspect...
    the few people who make the decisions on purchases dont look at just SAP bench's
    IN FACT - they dont look at many benches at all. its a small factor in the big scheme of things. The Global Company i work for has a server room the size of a football field. The cost of backup media is MUCH higher than server cost.
    They look at price/performance & longevity. Especially in times like these where every company should be & IS looking to cut everything possible. Lower power consumption wins hands down. I dont care if its intel or amd, In the long run a lower power bill wins. A little well-known secret. the life cycle of desktops/laptops is shorter than servers, with 1000+ laptops/desktops per site, 50+ sites just in the US alone. You're going to look at long-term savings in power consumption to offset the cost of a pc's life-cycle. Again because in these times NOBODY with half a brain is going to go and buy the biggest and best server when even yesterdays servers can last 10+ years if stretched. not many large corporations are shelling out the dough for the best of the best.

    IF Intel dramatically cut pricing, they would EASILY once and for all crush amd more than they already have these last few years.
    And if intel can spend 7 billion on fabs for 32nm, they can squash amd with unrealistic price cuts. then you'll see huge power bills & jacked up prices FOREVER. personally, i buy amd for this sole reason. not for performance OR price.

    so someone explain why the author gets bashed??? thats simply childish. he shouldnt have to defend his own articles because of some stupid amd-intel war, Its simply business as usual for EVERY LARGE CORPORATION to try and out the competition.
    Reply
  • garydale - Thursday, February 12, 2009 - link

    While gamers might look at the absolute performance, and workstation users might look at price/performance, enterprise server rooms operate on the total cost. Can you save power, reduce heat, shrink the space, etc.? While a SAP benchmark might attract notice, no one is going to redesign their server room based on it - especially if they're not using SAP. Reply
  • Mr Roboto - Wednesday, February 11, 2009 - link

    Wasn't this expected? I mean this IS what i7 was built for. Intel isn't happy with desktop and notebook domination, they want the whole thing. They obviously want to put AMD out of business and just might with these i7 server chips.

    Anand even stated before i7 was made for servers and this is why you only see a 15% increase (roughly) over highly clocked Core2Quads in the desktop and gaming consumer space.

    I really hope AMD can come with some 8 core 45nm server chips otherwise this could be the nail in the coffin.
    Reply
  • zagortenay - Wednesday, February 11, 2009 - link

    Yes I am an AMD fan and I can see so many sheep like creatures licking the knife of a big, bad, ugly butcher named Intel. Reply
  • Zak - Wednesday, February 11, 2009 - link

    It's really hard to praise AMD since the best they can do with their top-of-the line CPU is to compete with last gen of Intel CPUs. Intel clearly has a momentum and good products in the pipeline, and AMD lost their momentum a long time ago and are clearly struggling. AMD provides value CPUs these days. I built my HTCP around a quad AMD CPU because it was inexpensive, the whole platform was cheap. But if I want the most speed for my apps and games I'd have to be crazy not to go with i7.

    Z.
    Reply

Log in

Don't have an account? Sign up now