3ds Max 9 (32-bit Windows)
We tested with the 32-bit version of 3ds Max version 9, which has improvements that help multi-core systems but which is not as aggressively tuned for SSE as LINPACK and zVisuel. We used the "architecture" scene, which has been a favorite benchmarking scene for years. We performed all tests with 3ds Max's default scanline renderer, we enabled SSE support, and we rendered at HD 720p (1280x720) resolution. We measured the time it takes to render ten frames (frames 20 to 29).
As promised, we profiled our different benchmarks to understand them better. We performed profiling with AMD's CodeAnalyst; VTune profiling will follow later. 3ds Max runs four modules when you render:
To keep things simple, we summarized our findings with a weighted average over all modules.
| 3dsmax Profiling | |
| Profile | Total |
| Average IPC (on AMD 2350) | 1 |
| Instruction mix | |
| Floating Point | 39% |
| SSE | 12% |
| Branches | 13% |
| L1 datacache ratio | 0.56 |
| L1 Instruction ratio | 0.27 |
| Performance indicators on Opteron 2350 | |
| Branch misprediction | 6% |
| L1 datacache miss | 1% |
| L1 Instruction cache miss | 5% |
| L2 cache miss | 0% |
As you can see, 3ds Max is mostly about floating-point performance with a bit of SSE instructions. It runs perfectly in the L1 and L2 cache of our CPUs. To make the graph easier to read we did not report our results in the classic way (rendering time) but expressed them in images rendered per hour (10 images * 3600 seconds divided by render time). Higher is therefore better.

The Xeon 5472 is about 8% faster than its older brother and widens the gap from the AMD Armada. We included quite a few results of older tests. This benchmark focuses on the CPU; chipset and RAM choices don't impact performance much. Interestingly, the Opteron 2350 is about as fast as four 2.4GHz single-core Opterons. Thus, in software with a "small dash" of SSE, the new architecture is about 20% faster. If we extrapolate our AMD quad-core results to 3GHz, the result would be about 59 images per second, which indicates that AMD's newest is about 10% slower than Intel clock for clock. That is no real surprise anymore: FLOPS showed us that the raw x87 FP and SSE power of AMD's latest architecture is slightly lower than the newest Xeon. It also can only overpower the Xeon 53xx if there are enough divisions involved. AMD's Barcelona architecture will only show a real advantage in bandwidth limited FP situations such as SPECfp2006 and many HPC applications.
|
||||

February 9, 2010
February 8, 2010