Limitations of this report

We are happy that we finally feel comfortable with most of our virtualization testing. We still have to do some in-depth profiling to be completely sure what is going on, but we decided to not wait any longer. This is only the beginning, though. We have tested several other virtualization scenarios (including Windows as Guest OS, Hyper-V as hypervisor, Oracle as database, and so on) but we are still checking the validity of those benchmarks. In other words, we are well aware that this report cannot give you a complete picture; it's only an initial rough draft.

Here are the limitations of our current virtualization testing:

  • Out of all the databases, MySQL has shown the best performance on the AMD platform relative to the Intel platform. This is probably a result of the excellent Opteron and Athlon 64 optimizations in the gcc compiler.
  • We use a 64-bit version of MySQL, and the Intel architectures pay a small penalty when you run a 64-bit database (no macro-op fusion for example). However, as the 64-bit MySQL performs quite a bit better than the 32-bit one, we feel we made the right decision.
  • Our best Opteron is a 95W Opteron 8356, while we used a 130W Xeon X7460 and a 130W Xeon X7350. This is simply a result of what we have had available in the labs in the past months. This problem is easy to solve: the performance of the Opteron 8360SE (125W) will be between 1% and 8% higher, so for those looking at the Opteron 8360SE it is pretty easy to get an idea what this CPU could do.
  • No HPC benchmarking, as we wanted to focus our efforts and time on our first virtualization results. Priorities…

Please keep these limitations in mind.

Conclusion

The third party benchmark numbers are unanimous: servers based on Intel's monster hex-core processor are the best choice when for high-end database/ERP applications. Compared to the previous Xeons, performance has increased by 40% or more while power consumption has dropped. The 6-core Xeon is the clear winner and offers a very nice upgrade path for owners of current Xeon 73xx servers. We even dare to predict that the newest Nehalem based Xeons will not really enter this market before the octal-core Beckton is launched in the second half of 2009.

When it comes to the virtualization market, which is a much larger market (in shipments), it is a very different picture. Where the 6-core CPU extends an existing lead elsewhere, for virtualization the new 45nm Xeon MP comes just in time. The quad-core Opteron has been giving the Xeon 73xx a serious beating, offering up to 24% better performance while using 20-25% less power (X7350 versus 8356). If you prefer to look at CPUs with approximates the same TDP, Opteron was offering about a third more performance while consuming a few Watts less. The hot and power hungry FB-DIMMs do not help in a market where performance/Watt and more memory (higher consolidation ratios) rule, and the Opteron clearly has better virtualization support.

The new 45nm Xeon X7460 brings the virtualized performance/Watt crown back to the Intel camp, and we expect the E7450 (2.4GHz) to offer an even better performance/Watt ratio. After all, the E7450 also has six cores but at a lower TDP. In the very near term, AMD will probably have no other choice than to lower the price of its fastest quad-cores. Nevertheless, the battle for the virtualization market is still not over, as both AMD and Intel have new quad-cores lined up.

Quite a few people gave us assistance with this project, and as always we would like to thank them. Our thanks goes to Sanjay Sharma, Trevor Lawless, Kristof Sehmke, Matty Bakkeren, Damon Muzny, Brent Kerby, Michael Kalodrich and Angela Rosario. A very special thanks to Kaushik Banerjee who pointed out errors in our virtualization benchmarking procedure and Tijl Deneut, who helped me solve the weirdest problems despite the numerous setbacks we encountered in this project.

Power
Comments Locked

34 Comments

View All Comments

  • duploxxx - Thursday, November 13, 2008 - link

    your virtualisation life was very short, perhaps marketing can keep you alive for a while since on paper you are better with the amount of cores.

    your 24 cores @2,66ghz are just killed by 16cores @2,7ghz

    http://www.vmware.com/products/vmmark/results.html">http://www.vmware.com/products/vmmark/results.html
  • synergyek - Wednesday, October 15, 2008 - link

    Why only testing scanline render? It's a slow and old monster. Can you add mental ray render to your tests or, maybe, vray, which is used in arch. visualizations? Also you can use Maya 32/64-bit (software, hardware, mental ray tests) for both windows and linux platforms. Mental ray on Vray uses all cores available in the system, and results must be much better, than ordinary scanline.
  • duploxxx - Saturday, September 27, 2008 - link

    Nice article, altough in virtualisation with VMmark it was already clear that the new dunnington had more headroom with the additional cores.

    only few remark, since you are talking about a retail price of +25000euro you could at least add for information that there are 8socket barcelona for about 5000euro more that scale again way better then dunnington with its 32 cores. So indeed intel did a step up again after there tigertown was heavy beaten by new barcelona in 4s even in low speed but at a certain cost of platform, afterall this dunnington is not cheap. it will be the question what a 4s shangai @3.0 ghz will do against this 6 core giant, afterall it is a huge die and the shangai will be way cheaper and consume less.

    lets hope you update this nice article with the soon to be released shangai.
  • Sirlach - Friday, September 26, 2008 - link

    From my research when the hex cores were announced the super micro boards came with an x16 slot. Is it possible to see how CPU restricted multithreaded games perform on this monster? Since it is running server 2008 this is theoretically possible!
  • BaronMatrix - Thursday, September 25, 2008 - link

    It seems like a better comparison would be with the number of cores the same. You could take a 4S and remove one chip and match that against a 2S Dunnington.

    From what I saw, it is nowhere near 50% faster though it has 50% more cores plus 4 times the cache. It looks like Intel may NEVER catch up with Opteron. Shanghai will just increase the difference.

    It's just a shame Hector decided to have a "devalue the brand name" fire-sale or we'd be much closer to Bulldozer and SSE5.
  • trivik12 - Thursday, September 25, 2008 - link

    4S has been one market where AMD dominated even after conroe's release. With Tigerton intel chipped away AMD's market share bcos of barcelona issues. with Dunnington Intel has a performance advantage. U dont look at per core performance but overall platform performance. AMD needs to catch up soon bcos with beckton AMD will be behind 8th ball in that market as well.
  • snakeoil - Wednesday, September 24, 2008 - link

    intel is cannibalizing nehalem this are desperate measures from a desperate man.
    this is a dead end road, sooner or later intel will have to dump the front side bus,but its evident that intel is not very confident about nehalem and quick path.
    these processor are the last kick of an agonizing technology.
    this is just a souped up old car. nothing more.
  • kingmouf - Wednesday, September 24, 2008 - link

    Although a good thing for testing, I'm wondering if by making artificially the VMs more processing intensive rather than memory intensive, one is getting a quite wrong idea of the power consumption between the Intel and the AMD systems.

    Off-chip activity (coming from signal amplifiers, sensors, external buses etc) results in great power consumption. Actually one should expect it to be a crucial part of the total consumption of a system. In this case, I believe the AMD system has an advantage with the memory controller being incorporated in the processor chip. To one extend this also becomes clear in your testing.

    Comparing the Intel CPUs one may observe that the 6-core part has a huge cache memory that seriously limits the main memory accesses. In the case of the 6VMs, there will also be reduced inter-socket communication. Both result in very serious reductions in off-chip activity, which materialises in a whopping 25% reduction in power usage.

    Therefore I believe that making the benchmarking process more memory intensive, as you point is the real-world scenario, AMD could earn quite a few points there.


    On a more general argument now, I can't stop thinking that chips like the 74xx Xeons are somewhat a waste of transistors. Intel is simply following the "bully" path rather than the "smart" path. I cannot stop thinking what would the results look like if instead of the two extra cores and the huge amount of cache, Intel added a TCP offload engine, a true hardware RAID controller, a block cipher accelerator, a DSP engine or an extra FP processor core (I'm not mentioning a memory controller because someone will pop up and say that they have already done that in the nehalem). All these things - and one could add much more - are integral to any server or HPC system and I believe can offer much more countable results than the two extra cores. Better performance and definitely better power usage. On the other hand, considering the weaknesses of AMD, maybe that is the company that should really get down to it.

    Not long there was a lot of hype of AMD opening up their socket and coherent HyperTransport so that people could actually produce accelerators. What has happened with that? Are there any products on that market? It would be interesting to see some benchmarking with these things. :)
  • JohanAnandtech - Thursday, September 25, 2008 - link

    "Im wondering if by making artificially the VMs more processing intensive rather than memory intensive, one is getting a quite wrong idea of the power consumption between the Intel and the AMD systems. "

    You are right that most virtualized workloads (including the OLTP ones) will need a lot more memory *space*, but they are not necessarily more memory intensive. It is good practice for example to use another scheduler to make it more CPU intensive: you are getting more transactions per second on the same machine. It is pretty bad to lose your watts on anything else but transactions.
  • jedz - Wednesday, September 24, 2008 - link

    It's pretty obvious that AMD is not competing neck to neck in the server arena with their current opteron offerings because of the fact that they are way behind Intel's, and it's not right to compare the opteron to an intel7460 in terms of performance/watt. Why don't you wait for AMD's Shanghai and then redo this benchmarking process.

    Maybe it will do justice for AMD....

Log in

Don't have an account? Sign up now