Finally, an EPIC battle!

Intel has executed well the past 2-3 years - Swiss clockwork well. The 45nm family has lowered power consumption significantly and raised performance by about 10 to 20%. That allows Intel to win almost every benchmark in the desktop and workstation market. But don't worry; things are a lot more interesting in the server market.

You might remember from our in-depth analysis that the floating point power of the 45nm Intel CPUs is at least as good as or better than AMD's latest in raw FP performance on a clock-for-clock basis. When it comes to pure integer power, the quad-core Opteron does not have a chance against the 45nm Intel CPUs: the latter is clock-for-clock an impressive 40-45% faster. Add to that the fact that the fastest Intel CPU runs at 3.2GHz while AMD is stuck at 2.5GHz for the moment, and it is clear that the AMD chips do not have a chance in single-threaded integer workloads. HP posted the SPEC CPU 2006 scores of two very similar servers:

SPEC2006 Performance Comparison
CPU Tested Server SpecInt2006 (Base - Peak) Specfp2006 (Base - Peak)
Opteron 2356 2.3 GHz Proliant BL465c G5 13.2 - 14.8 16.2 -17.8
Xeon L5410 2.33 GHz Proliant BL460c 18.8 - 21.6 16.8 -19.8

The latest Opteron is left far behind in the integer benchmark, but is competitive in floating point when you compare clock-for-clock. It is not hard to see why Intel's 45nm CPUs are superior in single-threaded workloads. Luckily (for AMD), Intel took it's time to introduce its impressive 45nm technology in the quad-socket market, and AMD only faces the Intel's 65nm family for now.

Extended SPEC2006 Performance Comparison
CPU Tested Server SpecInt2006 (base - peak) Specfp2006 (base - peak) SpecInt2006 rate (base - peak) SpecFp2006 rate (base - peak)
Opteron 8356 2.3 GHz ProLiant BL685c G5 12.2 - 13.8 15.1 - 17.2 160 - 184 143 -157
Xeon 7340 2.4 GHz ProLiant BL680c G5 18 - 20.4 15.9 - 18.3 157 - 188 100 - 108
Xeon 7330 2.4 GHz PRIMERGY RX600 S4     151 - 177 97.6 - 104

While AMD's flagship processor is still no match in single-threaded integer code, it matches a slightly higher clocked Intel Xeon 7340 in multi-threaded integer performance. Single-threaded floating performance is essentially the same (clock-for-clock), and when it comes to multi-threaded floating point performance performed upon huge datasets, there is no stopping to the best AMD chip: it is up to 50% faster than its competitor. It is interesting to note that the x7350 Xeon at 2.93GHz is not faster than its slower brother at 2.4GHz in SPECfp2006, clearly indicating a bottleneck.

You probably guessed what the bottleneck in the Xeon system is. We used our multi-threaded, 64-bit Linux Stream binary (Courtesy Alf Birger Rustad) based on v2.4 of Pathscale's C-compiler, compiled with the following switches:

-Ofast -lm -static -mp

We tested with 16 threads.

Memory Performance Comparison
  Copy Scale Add Triad Average
Quad Opteron 8356 20867 20860 20892 20945 20891
Quad Xeon 7330 9778 8973 9008 9008 9192

No matter which Xeon 73xx you use, the best each can hope for is less than 600MB/s of memory bandwidth. That is slightly better than a PIII 1GHz with 133MHz SDRAM! Considering a current Xeon 2.4GHz is at least 4 times (and more) faster than a PIII at 1GHz, it is clear that this is a severe bottleneck that won't be solved until a Xeon "Nehalem" MP with CSI is available. Until then, Intel's Xeon MP faces a very capable competitor.

Test Setup SAP SD
Comments Locked

0 Comments

View All Comments

Log in

Don't have an account? Sign up now