Magny-Cours

You probably heard by now that the new Opteron 6100 is in fact two 6-core Istanbul CPUs bolted together. That is not too far from the truth if you look at the micro architecture:  little has changed inside the core. It is the “uncore” that has changed significantly: the memory controller now supports DDR-1333, and a lot of time has been invested in keeping cache coherency traffic under control. The 1944-pin (!) organic Land Grid Array (LGA) Multi Chip Module (MCM) is pictured below.

The red lines are memory channels, blue lines internal HT cache coherent connects. The gray lines are external cache HT connections, while the green line is a simple non coherent I/O HT connect.

Each CPU has two DDR-3 channels (red lines). That is exactly the strongest point of this MCM: four fast memory channels that can use DDR-1333, good for a theoretical bandwidth peak of 42.7 GB/s. But that kind of bandwidth is not attainable, not even in theory bBecause the next link in the chain, the Northbridge, only runs at 1.8GHz. We have two 64-bit Northbridges both working at 1.8 GHz, limiting the maximum bandwidth to 28.8 GB/s. That is price AMD’s engineers had to pay to keep the maximum power consumption of a 45nm 2.2 GHz  below 115W (TDP).

Adding more cores makes the amount of snoop traffic explode, which can easily result in very poor scaling. It can get worse to the point where extra cores reduce performance. The key technology is HT assist, which we described here.  By eliminating unnecessary probes, local memory latency is significantly reduced and bandwidth is saved. It cost Magny-cours 1MB of L3-cache per core (2MB total), but the amount of bandwidth increases by 100% (!) and the latency is reduced to 60% of it would be without HT-assist.

Even with HT-assist, a lot of probe activity is going on. As HT-assist allows the cores to perform directed snoops, it is good to reach each core quickly. Ideally each Magny-cours MCM would have six HT3 ports. One for I/O with a chipset, 2 per CPU node to communicate with the nodes that are off-package and 2 to communicate very quickly between the CPU nodes inside the package. But at 1944 pins Magny-Cours probably already blew the pin budget, so AMD's engineers limited themselves to 4 HT links.

One of the links is reserved for non coherent communication with a possible x16 GPU. One x16 coherent port communicates with the CPU that is the closest, but not on the same package. One port is split in two x8 ports. The first x8 port communicates with the CPU that is the farthest away: for example between CPU node 0 and CPU node 3. The remaing x16 and x8 port are used to make communication on the MCM as fast as possible. Those 24 links connect the two CPU nodes on the package.

 

The end result is that a 2P configuration allows fast communication between the four CPU nodes. Each CPU node is connected directly (one hop) with the other one. Bandwidth between CPU node 0 and 2 is twice than that of P0 to P3 however.

Whilte it looks like two Istanbuls bolted together, what we're looking at is the hard work of AMD's engineers. They invested quite a bit of time to make sure that this 12 piston muscle car does not spin it’s wheels all the time. Of course if the underground is wet (badly threaded software), that will still be the case. And that'll be the end of our car analogies...we promise :)

Index The SKUs
POST A COMMENT

58 Comments

View All Comments

  • kokotko - Saturday, April 24, 2010 - link

    why you are NOT SHARIG same "shareable" components - like PSU ??????

    NO WONDER THE NUMBERS ARE WORSE ! ! !
    Reply
  • blurian589 - Tuesday, May 11, 2010 - link

    3ds max crashes because of the mental ray renderer. remove the plugin from loading and max will start up. its due to mental ray cannot see more than 16 threads (physical or virtual via hyper-threading). please do test the max rendering performance. thanks Reply
  • Desired_Username - Tuesday, June 29, 2010 - link

    In the final words it states "We estimate that the new Opteron 6174 is about 20% slower than the Xeon 5670 in virtualized servers with very high VM counts. " But in the virtualization section I can't seem to figure out what brought you to that conclusion. The VMmark scores for the Cisco X5680 system was 35.83@26 tiles. You have the VMmark for the 6176SE at 31 which is dead on to the HP DL385 G7 which got 30.96@22 tiles. I see the X5680 15% better at best. And the Cisco x5680 system had 192GB of memory to the HP 6176SE system had 128GB. What am I missing here? Reply
  • jeffjeff - Wednesday, September 22, 2010 - link

    I appreciate AMD's lower CPU cost but on the other hand, Oracle will license me their RDBMS per core and whether it's an Intel 56xx or AMD 61xx, I am still paying a relation of .5 license per core.

    So in the end, I would pay 6 cores for AMD and 3 cores for Intel. The price per core is much higher than the hardware price difference.

    Any thought or solutions on this issue would be appreciated...

    Joffrey
    Reply
  • stealthy - Wednesday, November 24, 2010 - link

    Would it be possible to get the xml parameter files you have used in this test ?
    We are currently in a trial phase at my company to see how the current crop of intel boxes (dual Xeon X5460 procs) hold up against a new z10 system.
    Did you run the swingbench on the server itself or did you use a dedicated client to test ?
    Reply
  • Big_Mr_Mac - Thursday, December 16, 2010 - link

    In 1991 I had an AMD 386-40 that kicked the snot out of Intel pride and joy 486DX2-66. Benchmarks were 25%+ across the board over Intel. Then Intel lied to the market and started passing off cull processors as viable options calling the 25Mhz and 50Mhz processors, when they were actually processors that failed the benchmarks for 33Mhz and 66Mhz respectively.

    In 1998 When Win98 Beta was released I was building Servers and workstations at a Tech-company and Again the AMD was kicking the snot out of Intel. Load times on new system builds, boot time and performance. The Intel chips could not hack it. Then when MS release their actual market version of Win98...all of a sudden you could not even use an AMD processor to run it. You had to wait 2 weeks for MS come up with a "AMD Patch" to run on an AMD system.

    One think I have seen over 20 years in the industry is that Intel will, Lie, Cheat, Steal and Bribe to try and get the upper hand on AMD. Always have....Always Will!!
    Reply
  • rautamiekka - Saturday, December 25, 2010 - link

    Why the fuck are you testing with WinServer and M$ SQL ? Just reading this makes my blood boil 9 times in a second. Reply
  • polbel - Saturday, May 21, 2011 - link

    i've been an amd fan for as long as i can remember. started fixing computers in 1979. used to fix mai basic four minis in the mid-80s that were built on amd bit-slice bipolar cpus on boards that cost 15,000$.

    just got 2 opteron 6172 cpus from ebay for what i thought was peanuts (450 $ each) only to discover upon delivery that both had hairline cracks at a 45 degree angle on one corner of the contact pad surface. looking at their web site i could figure i was out on limb and they would laugh in my face if asked for warranty support on these not-boxed cpus. i know some dumb ass managed to break those cpu corners, and tried to shove the crap to an ebay sucker, but the problem lies deeper, mostly in the g34 socket physical design itself of these otherwise beautiful electronic products. the edge of the metal cover doesn't reach the edge of the fiber board, leaving some unsupported area to be broken by dumb asses mimicking the old days when they could put a 40-pin dip cpu upside-down in its socket. so i'm freshly reviewing my belief system about amd while i figure a solution for this crap-hits-the-fan situation. wish i could have told amd engineers to cover theses last millimeters at the bleeding edge. they might say this and that about warranty, i still hold them responsible for this preventable disaster.

    paul :-)
    Reply

Log in

Don't have an account? Sign up now