AnandTech Home IT Portal Home Increase Font Size Decrease Font Size Change Page Size
Expensive Quad Sockets vs. Ubiquitous Dual Sockets
Expensive Quad Sockets vs. Ubiquitous Dual Sockets
Date: October 6th, 2009
Topic: IT Computing
Manufacturer: Various
Author: Johan De Gelas
Buy the Opteron Third-Generation Processor
Blank
 
 

AMD's dual and quad platform: consistency

AMD's PR is making a lot of noise about consistency, and rightly so. The quad socket and dual socket processors are - besides the obviously different multiprocessor capabilities - exactly the same. In the case of virtualization, this allows you to optimize your virtual machines and hypervisors once and then clone them as much as you like. There are fewer worries when moving virtual machines around, and there is no fiddling with masking processor capabilities. This is also well illustrated when you check what mode the VMware ESX virtual machines run. The table is pretty simple when you look at VMs running on top of an AMD processor: the virtual machines running on dual Opterons will run software virtualization, while the quad-cores will almost always run in the fastest mode (hardware virtualization combined with hardware assisted paging). The same is true for Hyper-V: it won't run on the dual-core Opterons and it will run at full speed on the quad-cores. It is remarkably simple compared to the complete mess Intel made: some of the old Pentium 4 based CPUs support VT-x, some don't. Some of the lower end Xeons launched in 2007 and 2008 don't and so on.

There is some inconsistency on HyperTransport and L3 cache speeds, but those will only cause small performance variations and no software management troubles. Of course, AMD's very consistent dual and quad socket platform is not without flaws either. The NVIDIA MCP55 Pro chipset was at times pretty quirky when installing new virtualization software. Most of the time, a patch took care of that, and the Opteron servers were running rock solid afterwards, but in the meantime a lot of valuable time was wasted. Also, the current platform has not evolved for years and is starting to show its age: we found out that the motherboards consume a bit more power than they should. In 2010, all Opteron server platforms will use AMD chipsets only.

The core part of the new hex-core Opteron is the identical to that of the quad-core, but the "uncore" part has some improvements. With the exception of the 2.8GHz 2387/8387 and 2.9GHz 2389/8389, most quad-core Opterons still connect with 1GHz HyperTransport links. The hex-core Opteron runs with speeds between 2 and 2.4GHz. The hex-core Opteron always connects to the other CPUs in the server via 2.4GHz HyperTransport links. That makes little difference in a 2P server, but performance gets quickly limited by interconnection speeds in 4P. Even at 2.4GHz (9.6GB/s interconnect), probe broadcasting can limit performance, and that is why you can reserve up to 1MB of cache for a snoop filter. These improvements make the hex-core Opteron a more interesting choice than the quad-core Opterons - even at lower clock speeds - for quad socket servers.

In fact, we feel that besides the very low power Opteron 2377 EE, the quad-core Opterons are of little use. If your application scales relatively badly, there is the X55xx series which offers much better "per thread" performance. If your application scales well, two 2.6GHz Opteron 2435 will offer 15% better (and sometimes more) performance than a 2.9GHz Opteron 2389 with the same power consumption. Using relatively "old" technology such as DDR2, the hex-core Opteron based servers are very affordable, especially if you compare them with similar Xeon servers.

The Intel Dual socket platform: pricey performance and performance/watt champion

We have already tested the new dual socket "Nehalem" Xeon platform. It is the platform with the fastest interconnects, the most threads per socket (thanks to Hyper-Threading), the most bandwidth (triple-channel) and the most modern virtualization features (Intel VT-D). Even the top models are far from power hogs: at full load, the X5570 offers an excellent performance/watt ratio. The low-power L5520 at 2.26GHz was a real champion in our performance per watt tests and is available at reasonable prices.

The relatively new platform (chipset, DDR3) is still on the expensive side: a similarly configured Dell R710 (two Xeon 5550 2.66GHz, 8 x 4GB 1066MHz DDR3) costs about one third more than a Dell R805 (Two Opteron 2435, 8 x 4GB 800MHz DDR2): $5047 versus $3838 (pricing at the end of September 2009). If you chose the Xeon platform, you should be aware of the fact that Intel's low end is much less interesting: the best Xeon 55xx CPUs have a clock speed between 2.26 and 2.93GHz. The low end models, the 5504 and 5506 are pretty crippled, with no Hyper-Threading, no Turbo Boost, and only half as much L3 cache (4MB). These crippled CPUs can keep up with the quad-core Opterons at about 2.5GHz, but they are the worst Xeons when you look at idle and full load power. The performance per Watt of the Xeon EE550x is pretty bad compared to the more expensive parts.

The Intel Quad socket platform

There is no quad socket version of Intel's excellent "Xeon Nehalem" platform. We will have to wait until the Nehalem-EX servers ship in the beginning of 2010. At that time, servers with the octal-core 24MB L3 cache CPU will almost certainly end up in a higher price class than the current quad socket servers. One indication is that Intel positions the Nehalem-EX as a RISC market killer. Then again, Intel might as well bring out quad-core versions too. We will have to wait and see.

So there's no Hyper-Threading, Turbo Boost, EPT, NUMA, or fast interconnects for the current Xeon "Dunnington" platform, which is still based on a "multi independent FSB" topology. It has massive amounts of bandwidth in theory (up to 21GB/s), but unfortunately less than 10GB/s is really available. Snooping traffic consumes lots of bandwidth and increases the latency of cache accesses. The 16MB L3 cache should lessen the impact of the relatively slow memory subsystem, but it is only clocked at half the clock speed of the core. A painful 100 cycle latency is the result, but luckily every two cores also have a shared and fast 3MB L3 cache.

When it was first launched, the Xeon MP defeated the AMD alternatives by a good margin in ERP and heavy database loads. It reigned supreme in TPC-C and broke a few new records. More importantly it took back 9% of market share in the quad socket market according to the IDC Worldwide Server Tracker. But at that time, the 2.66GHz hex-core had to compete with a 2.5GHz quad-core Opteron with a paltry 2M of shared L3, and AMD has been working hard on a comeback. The massive Intel chip (503 mm2) has to face a competitor that has three times as much L3 cache and 50% more cores at higher clock speeds, and that is not all: the DDR2-800 DIMMs deliver up to 42GB/s or four times as much bandwidth to the four AMD chips. At the same time, the Xeon behemoth has to outpace the ultra modern Dual Xeon platform by a decent margin to justify its much higher price.

What Intel and AMD Are Offering   Next Page

 
  Index

Tools Share
Find lowest prices Find the lowest prices
Digg   del.icio.us   E-mail  
Print This Article Print this article  

31 Comments - Last by joekraska, 43 days ago
Username:
Password:
Quad Opteron 2389??? by Casper42, 45 days ago
I know its late, but on page 4 of this article you say your using a Dual 2389 setup where each chip is Quad Core.

Somehow that morphs into a "Quad Opteron 2389" on page 6 both in the text and in the graphic. Since a Quad 2xxx is not possible, is this a Dual 2389 or a Quad 8389?

Then on page 7 it becomes a Quad Opteron 8389

Am I losing my mind?

I see now that both a Quad 8389 and a Dual 2389 are listed in Page 4, but why on earth did you guys bounce back and forth so much between them?

Reply
RE: Quad Opteron 2389??? by JohanAnandtech, 45 days ago
You are not losing your mind. The Quad 2389 is a quad 8389 of course. I have fixed the error. Thanks.

The dual sockets machines were mostly used to check how the software scales (MS SQL server, virtualization) and how the power consumption compares to the quad socket machines. I hope this makes it clear?

Reply
Power Supply question by mamisano, 45 days ago
Great article, just have a question about the power supplies. Why do the quad-core servers need a 1200W PSU if the highest measured load was 512W? I know you would like to have some head-room but it looks to me that a more efficient 750 - 900W PSU may have provided better power consumption results... or am I totally wrong? :)

Reply
RE: Power Supply question by JarredWalton, 45 days ago
Maximum efficiency for most PSUs is obtains at a load of around 40-60% (give or take), so if you have a server running mostly under load you would want a PSU rated at roughly twice the load power. (Plus a bit of headroom, of course.)

Reply
RE: Power Supply question by JohanAnandtech, 44 days ago
Actually, the best server PSUs are now at maximum efficiency (+/- 3%) between 30 and 95% load.

For example:
http://www.supermicro.com/products/powersupply/80PLUS/80PLUS_PWS-721P-1R.pdf

And the reason why our quads are using 1000W PSUs (not 1200) is indeed that you need some headroom. We do not test the server with all DIMM slots filled and you also need to take in account that you need a lot more power when starting up.


Reply
ram by hifiaudio2, 45 days ago
FYI the R710 can have up to 192gb of ram...

12x16GB

not cheap :) but possible



Reply
RE: ram by JohanAnandtech, 45 days ago
at $300 per GB, or the price of 2 times 4 GB DIMMs, I don't think 16 GB DIMMs are going to be a big success right now. :-)

Reply
RE: ram by wifiwolf, 44 days ago
for at least 5 years you mean

Reply
2 x 2 = sweet spot by Candide08, 45 days ago
One performance factor that has not improved much over the years is the decrease in percentage of performance gains for additional cores.

A second core adds about 60% performance to the system.
Third, fourth, fifth and sixth cores all add lower (decreasing) percentages of real performance gains - due to multi-core overhead.

A dual socket dual core system (4 processors) seems like the sweet spot to our organization.

Reply
RE: 2 x 2 = sweet spot by Calin, 44 days ago
If your load is enough to fit into four processors, then this is great. However, for some, this level of performance is not enough, and more performance is needed - even if paying four times as much for twice as much performance

Reply
Comments Page 1 of 4

Free Forrester Risk Management Report
Demystifying Enterprise Risk Management. Download Free With Registration.
DOWNLOAD vWire Today - FREE TRIAL
Take Control of Your Virtual Infrastructure. Manage VI Data & Prevent Problems.
Report Unlicensed Business Software Use
Earn Up to $1 Million by Reporting Unlicensed Software Use. Fill Out Our Form!
Download Microsoft Visual Studio ® Team System
Streamline Dev processes, Reduce time to market. Try Microsoft Visual Studio Team System, FREE!
Supermicro Barebone Servers
We Carry Everything Supermicro. Low Price, Top Service, FREE Shipping, and more.




Latest news by
DailyTech

 November 20, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 November 19, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank




pipeboost
Copyright © 1997-2009 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information