Virtualization (ESX 3.5 Update 2/3)
As we discussed in the first page of this article, virtualization will be implemented on about half of the servers bought this year. Virtualization is thus the killer application and the most important benchmark available. Since we have not been able to carry out our own virtualization benchmarking (due to the BIOS issues described earlier), we turn to VMware's VMmark. It is a relatively reliable benchmark as the number of "exotic tricks hardly used in the real world" (see SPECjbb) are very limited.
VMware VMmark is a benchmark of consolidation. Several virtual machines performing different tasks are consolidated together and called a tile. A VMmark tile consists of:
The first three run on a Windows 2003 guest OS, the last three on SUSE SLES 10.

To understand why this benchmark is so important just look at the number of tiles that a certain machine can support:

The difference between the Opteron 8360 SE and the 8384 "Shanghai" is only 200MHz and 4MB L3 cache. However, this small difference allows you to run 18 (!) extra virtual machines. Granted, it may require that you install more memory, but adding memory is cheaper than buying a server with more CPU sockets or adding yet another server. Roughly calculated you could say that the new quad-core Opteron allows you to consolidate 27% more virtual servers on one physical machine, which is a significant cost saving.
Of course, the number of tiles that a physical server can accommodate provides only a coarse-grain performance measure, but an important one. This is one of the few times where a higher benchmark score directly translates to a cost reduction. Performance per VM is of course also very interesting. VMware explains how they translate the performance of each different workload in different tiles into one overall score:
After a benchmark run, the workload metrics for each tile are computed and aggregated into a score for that tile. This aggregation is performed by first normalizing the different performance metrics such as MB/second and database commits/second with respect to a reference system. Then, a geometric mean of the normalized scores is computed as the final score for the tile. The resulting per-tile scores are then summed to create the final metric.
Let us take a look:

Thanks to the lower world switch times, higher clock speed, and larger cache, the new "Shanghai" Opteron 8384 improves the already impressive scores of the AMD "Barcelona" 8356 by almost 43%. The only Intel that comes somewhat close is the hex-core behemoth known as the Xeon X7460, which needs a lot more power. IBM is capable of performing a tiny bit better than Dell thanks to its custom high performance chipset.
It is clear that the Xeon 7350 is at the end of its life: it offers a little more than 2/3 of the performance of the best Opteron while using a lot more power. Even the latest improved stepping, the Xeon X5470 at 3.33GHz, cannot keep up with the new Opteron quad-core. The reason is simple: as the number of VMs increase, so do the bandwidth requirements and the amount of world switches. That is exactly where the Opteron is far superior. It is game over here for Intel… until the Xeon 5570 2.93GHz arrives in March.
|
||||

February 9, 2010
February 8, 2010