Conclusion

The six-core Opteron is not an alternative to the mighty Xeons in every application. The Xeons are more versatile thanks to the higher clockspeeds, higher IPC, Hyperthreading and higher bandwidth to memory. The Xeon 55xx series is clearly the better choice in OLTP, ERP, webserving, rendering and there is little doubt that it will continue to reign in the bandwidth intensive HPC workloads. There are two types of applications where we feel that the AMD six-core deserves your attention: decision support databases and virtualization.

Since the launch of ESX 3.5, VMware has said more than once that performance-critical applications such as OLTP and Decision Support Databases will perform well on top of their hypervisor. Several enhancements make the newly launched vSphere 4 an even more attractive platform for such "heavy duty" applications. Hyper-V R2 and Xen 3.4 are clearly gearing up for the same task. So it is interesting that companies are now looking into virtualizing those performance-critical applications, the applications that still got their own dedicated server a few months ago. The motivation is that virtualizing these applications would allow the complete datacenter to be managed with the same flexibility as the light, already consolidated, applications. VMotion (Xenmotion, Live Migration) can then for example be used to migrate these applications faster and much more easily.

Of course, performance-critical applications are by definition more demanding when it comes to processing power. That is exactly what vApus Mark I measures: how well do performance-critical applications perform when they are virtualized? This is a relatively “new” market where the AMD 2435 shines. The new Opteron 2435 at 2.6 GHz was a pleasant surprise on vApus Mark I: it keeps up with more expensive Xeons on ESX 3.5 update 4 while consuming less, and offers a competitive performance/watt and performance/price ratio on vSphere 4. The six-core Opteron is about 11 to 30% slower on vSphere 4 than the 2.93 GHz Xeon X5570 but the overall cost of the Istanbul platform is significantly lower (DDR-2 versus DDR-3) and the 2.6 GHz 2435 consumes less power in a virtualized environment (*). On the condition that you optimize your hypervisor well to take advantage of the six cores (cell size is for example one critical optimization), we feel that the six-core Opteron is a worthy opponent for the Xeon “Nehalem” in this market. We tested only the 2435 versus the X55xx series. The Xeon E5540 2.53 GHz versus the Opteron 2431 2.4 GHz may show a slightly different view… the six-core Opteron and Xeon are both very competitive in this area, other factors than performance/price/power might conclude the decision. There is no clear winner in this part of the market, but the big news is of course that AMD offers a worthy alternative.

VMmark tells us that the Xeon X55xx handles large amounts of VM’s much better. With “light VM’s” the amount of memory you can place in a server plays in many cases a more important role than the CPU. In that case you might be better off with a low power quad-core instead of a six-core or high-clocked quad-core.

Lastly, the six-core Opteron will be a formidable competitor in the 4P market segment. But that is for a later article.

(*) Virtualized servers do not run idle very often.
 
A big thanks to Tijl Deneut for sacrificing his weekend to keep testing and checking together with me. Anand and Liz helped to get this article online, thanks!
Power Consumption & Market Analysis
POST A COMMENT

40 Comments

View All Comments

  • iocedmyself - Wednesday, June 17, 2009 - link

    Well something that was failed to be mentioned was that the 2P opteron machine costs about $6700, where as the nehalem 2p machine is very near to $16,000.

    as for power consumption a straight up comparison would be HP380 Xeon and HP 385 Opteron. At idle, both are 140W. With 100% CPU / Ram, 385 is around 300W, 380 (Xeon) is about 450W.

    another thing not discussed here - 4P Istanbul is 70-80% faster than 2P Nehalem, and there is no 4P Nehalem. 8P Istanbul is over 3 times as fast as 2P Nehalem. so until next gen Nehalem, there is no competition in the high end which probably has something to do with istanbul orders being through the roof.

    I also have to wonder if these benchmarks were conducted using one of Intel's little helpful optimized compilers.
    Reply
  • yasbane - Wednesday, June 10, 2009 - link

    would be nice to see some unix or linux benchmarks... Reply
  • riskyburden - Thursday, June 4, 2009 - link

    I might be naive here but surely the majority of these applications are favouring clock speed and no more than two cores, should there not be a bench for those companies that run multiple apps such as SQL and AD or IPFX etc all from one server and make a comparison there. I don't suggest it to be good network practice but that would interest me more. Reply
  • mino - Friday, June 5, 2009 - link

    For this part of SMB market pretty much any dual core CPU will do.

    Their bottleneck is almost allways on the storage side, sometimes with insufficient memory.
    And most also run default install where basic SW tweaks would make 100's percents in performance.
    Reply
  • befair - Wednesday, June 3, 2009 - link

    Johan never proves me wrong. Even an article meant to talk about AMD Opteron starts with a good deal of "Intel is the king!" stuff, as usual. Reply
  • alpha754293 - Wednesday, June 3, 2009 - link

    What happened to them?

    I would have to loved to have seen what the new 6-core AMDs would be able to do in this arena since it is (presumably) a much more competitive offering than the fastest Xeons all around.
    Reply
  • lopri - Tuesday, June 2, 2009 - link

    A Question: Is the 'snoop-filter' a hardware-based? I read that it can be enabled/disabled via BIOS, and since the cores are same as Shanghai cores.. But my question is, whether it's hardware-based or software-based (BIOS), shouldn't this work for inter-core communication as well if AMD decides to implement it? Reply
  • JohanAnandtech - Tuesday, June 2, 2009 - link

    I have to check, but I am pretty sure it is both. The "uncore" part has changed somewhat on Istanbul.

    "shouldn't this work for inter-core communication as well if AMD decides to implement it"

    Since the L3-cache keeps copies of shared L2-cachelines, I don't think that will help. There is already a very fast way of communicating with little overhead.
    Reply
  • tygrus - Monday, June 1, 2009 - link

    I would like to know the performance difference when using a cell size of 3 not 6 on the 6-core units or of 8 not 4 on Xeon 4Core8Thread ?

    Will have to wait for latter for more raw performance numbers (eg. memory local/system, SPEC CPU, task switching, OS/IO task servicing).

    How long before they update the boards for DDR3 based memory and better IO onboard ?

    It's a pity the ESX 4.0 update hasn't helped AMD .. are the improvements only available for Intel or was it to correct a previous Intel only problem ? What can AMD/partners do to improve performance ?
    Reply
  • JohanAnandtech - Tuesday, June 2, 2009 - link

    "I would like to know the performance difference when using a cell size of 3 not 6 on the 6-core units?"

    A cell size of 3 will not do any good if your VMs are MP. Eventhough ESX features "relaxed co-scheduling", there might quite a few cases where the Scheduler is not able to use all "slots" as some of vCPUs of the VMs might be behind. From the momemt you use more than 2 vCPUs, you will get situations where only one VM with 2 CPUs is scheduled on a cell of 3 CPUs. 8-cell: I have to try it.

    "How long before they update the boards for DDR3 based memory and better IO onboard ? "

    The AMD's Fiorano platform that will be available in a few weeks should have better I/O (PCIe gen 2) but will still be DDR-2 based.

    DDR-3 CPUs are scheduled for 2010.

    "It's a pity the ESX 4.0 update hasn't helped AMD .. are the improvements only available for Intel or was it to correct a previous Intel only problem ? "

    VMware's docs tell us they that CPU locking goes more quickly and that the scheduler is "cache aware", but most of the biggest improvements are EPT and better support for Hyperthreading.

    Reply

Log in

Don't have an account? Sign up now