AMD's 12-core "Magny-Cours" Opteron 6174 vs. Intel's 6-core Xeon

Name: AMD's 12-core "Magny-Cours" Opteron 6174 vs. Intel's 6-core Xeon
Item: AMD's 12-core "Magny-Cours" Opteron 6174 vs. Intel's 6-core Xeon
Author: Johan De Gelas

by Johan De Gelas on March 29, 2010 12:00 AM EST

Posted in
IT Computing

58 Comments | Add A Comment

58 Comments

Magny-Cours

You probably heard by now that the new Opteron 6100 is in fact two 6-core Istanbul CPUs bolted together. That is not too far from the truth if you look at the micro architecture: little has changed inside the core. It is the “uncore” that has changed significantly: the memory controller now supports DDR-1333, and a lot of time has been invested in keeping cache coherency traffic under control. The 1944-pin (!) organic Land Grid Array (LGA) Multi Chip Module (MCM) is pictured below.

The red lines are memory channels, blue lines internal HT cache coherent connects. The gray lines are external cache HT connections, while the green line is a simple non coherent I/O HT connect.

Each CPU has two DDR-3 channels (red lines). That is exactly the strongest point of this MCM: four fast memory channels that can use DDR-1333, good for a theoretical bandwidth peak of 42.7 GB/s. But that kind of bandwidth is not attainable, not even in theory bBecause the next link in the chain, the Northbridge, only runs at 1.8GHz. We have two 64-bit Northbridges both working at 1.8 GHz, limiting the maximum bandwidth to 28.8 GB/s. That is price AMD’s engineers had to pay to keep the maximum power consumption of a 45nm 2.2 GHz below 115W (TDP).

Adding more cores makes the amount of snoop traffic explode, which can easily result in very poor scaling. It can get worse to the point where extra cores reduce performance. The key technology is HT assist, which we described here. By eliminating unnecessary probes, local memory latency is significantly reduced and bandwidth is saved. It cost Magny-cours 1MB of L3-cache per core (2MB total), but the amount of bandwidth increases by 100% (!) and the latency is reduced to 60% of it would be without HT-assist.

Even with HT-assist, a lot of probe activity is going on. As HT-assist allows the cores to perform directed snoops, it is good to reach each core quickly. Ideally each Magny-cours MCM would have six HT3 ports. One for I/O with a chipset, 2 per CPU node to communicate with the nodes that are off-package and 2 to communicate very quickly between the CPU nodes inside the package. But at 1944 pins Magny-Cours probably already blew the pin budget, so AMD's engineers limited themselves to 4 HT links.

One of the links is reserved for non coherent communication with a possible x16 GPU. One x16 coherent port communicates with the CPU that is the closest, but not on the same package. One port is split in two x8 ports. The first x8 port communicates with the CPU that is the farthest away: for example between CPU node 0 and CPU node 3. The remaing x16 and x8 port are used to make communication on the MCM as fast as possible. Those 24 links connect the two CPU nodes on the package.

The end result is that a 2P configuration allows fast communication between the four CPU nodes. Each CPU node is connected directly (one hop) with the other one. Bandwidth between CPU node 0 and 2 is twice than that of P0 to P3 however.

Whilte it looks like two Istanbuls bolted together, what we're looking at is the hard work of AMD's engineers. They invested quite a bit of time to make sure that this 12 piston muscle car does not spin it’s wheels all the time. Of course if the underground is wet (badly threaded software), that will still be the case. And that'll be the end of our car analogies...we promise :)

Index The SKUs

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

58 Comments

View All Comments

Accord99 - Monday, March 29, 2010 - link
The X5670 is 6-core.
JackPack - Tuesday, March 30, 2010 - link
LOL. Based on price?

Sorry, but you do realize that the majority of these 6-core SKUs will be sold to customers where the CPU represents a small fraction of the system cost?

We're talking $40,000 to $60,000 for a chassis and four fully loaded blades. A couple hundred dollars difference for the processor means nothing. What's important is the performance and the RAS features.
JohanAnandtech - Tuesday, March 30, 2010 - link
Good post. Indeed, many enthusiast don't fully understand how it works in the IT world. Some parts of the market are very price sensitive and will look at a few hundreds of dollars more (like HPC, rendering, webhosting), as the price per server is low. A large part of the market won't care at all. If you are paying $30K for a software license, you are not going to notice a few hundred dollars on the CPUs.
Sahrin - Tuesday, March 30, 2010 - link
If that's true, then why did you benchmark the slower parts at all? If it only matters in HPC, then why test it in database? Why would the IDM's spend time and money binning CPU's?

Responding with "Product differentiation and IDM/OEM price spreads" simply means that it *does* matter from a price perspetive.
rbbot - Saturday, July 10, 2010 - link
Because those of us with applications running on older machines need comparisons against older systems in order to determine whether it is worth migrating existing applications to a new platform. Personally, I'd like to see more comparisons to even older kit in the 2-3 year range that more people will be upgrading from.
Calin - Monday, March 29, 2010 - link
Some programs were licensed by physical processor chips, others were licensed by logical cores. Is this still correct, and if so, could you explain in based on the software used for benchmarking?
AmdInside - Monday, March 29, 2010 - link
Can we get any Photoshop benchmarks?
JohanAnandtech - Monday, March 29, 2010 - link
I have to check, but I doubt that besides a very exotic operation anything is going to scale beyond 4-8 cores. These CPUs are not made for Photoshop IMHO.
AssBall - Tuesday, March 30, 2010 - link
Not sure why you would be running photoshop on a high end server.
Nockeln - Tuesday, March 30, 2010 - link
I would recommend trying to apply some advanced filters on a 200+ GB file.

Especially with the new higher megapixel cameras I could easilly see how some proffesionals would fork up the cash if this reduces the time they have to spend in front of the screen waiting on things to process.

AMD's 12-core "Magny-Cours" Opteron 6174 vs. Intel's 6-core Xeon

Magny-Cours

Post Your Comment

58 Comments

View All Comments

Accord99 - Monday, March 29, 2010 - link

JackPack - Tuesday, March 30, 2010 - link

JohanAnandtech - Tuesday, March 30, 2010 - link

Sahrin - Tuesday, March 30, 2010 - link

rbbot - Saturday, July 10, 2010 - link

Calin - Monday, March 29, 2010 - link

AmdInside - Monday, March 29, 2010 - link

JohanAnandtech - Monday, March 29, 2010 - link

AssBall - Tuesday, March 30, 2010 - link

Nockeln - Tuesday, March 30, 2010 - link

Log in

Don't have an account? Sign up now