Virtualization (ESX 3.5 Update 2/3)

As we discussed in the first page of this article, virtualization will be implemented on about half of the servers bought this year. Virtualization is thus the killer application and the most important benchmark available. Since we have not been able to carry out our own virtualization benchmarking (due to the BIOS issues described earlier), we turn to VMware's VMmark. It is a relatively reliable benchmark as the number of "exotic tricks hardly used in the real world" (see SPECjbb) are very limited.

VMware VMmark is a benchmark of consolidation. Several virtual machines performing different tasks are consolidated together and called a tile. A VMmark tile consists of:

  • MS Exchange VM
  • Java App VM
  • An Idle VM
  • Apache web server VM
  • MySQL database VM
  • A SAMBA fileserver VM

The first three run on a Windows 2003 guest OS, the last three on SUSE SLES 10.


To understand why this benchmark is so important just look at the number of tiles that a certain machine can support:

VMware VMmark number of Tiles

The difference between the Opteron 8360 SE and the 8384 "Shanghai" is only 200MHz and 4MB L3 cache. However, this small difference allows you to run 18 (!) extra virtual machines. Granted, it may require that you install more memory, but adding memory is cheaper than buying a server with more CPU sockets or adding yet another server. Roughly calculated you could say that the new quad-core Opteron allows you to consolidate 27% more virtual servers on one physical machine, which is a significant cost saving.

Of course, the number of tiles that a physical server can accommodate provides only a coarse-grain performance measure, but an important one. This is one of the few times where a higher benchmark score directly translates to a cost reduction. Performance per VM is of course also very interesting. VMware explains how they translate the performance of each different workload in different tiles into one overall score:

After a benchmark run, the workload metrics for each tile are computed and aggregated into a score for that tile. This aggregation is performed by first normalizing the different performance metrics such as MB/second and database commits/second with respect to a reference system. Then, a geometric mean of the normalized scores is computed as the final score for the tile. The resulting per-tile scores are then summed to create the final metric.

Let us take a look:

VMware VMmark

Thanks to the lower world switch times, higher clock speed, and larger cache, the new "Shanghai" Opteron 8384 improves the already impressive scores of the AMD "Barcelona" 8356 by almost 43%. The only Intel that comes somewhat close is the hex-core behemoth known as the Xeon X7460, which needs a lot more power. IBM is capable of performing a tiny bit better than Dell thanks to its custom high performance chipset.

It is clear that the Xeon 7350 is at the end of its life: it offers a little more than 2/3 of the performance of the best Opteron while using a lot more power. Even the latest improved stepping, the Xeon X5470 at 3.33GHz, cannot keep up with the new Opteron quad-core. The reason is simple: as the number of VMs increase, so do the bandwidth requirements and the amount of world switches. That is exactly where the Opteron is far superior. It is game over here for Intel… until the Xeon 5570 2.93GHz arrives in March.

Other (Windows 2003 64-bit) Pricing
POST A COMMENT

29 Comments

View All Comments

  • zpdixon42 - Wednesday, December 24, 2008 - link

    DDR2-1067: oh, you are right. I was thinking of Deneb.

    Yes performance/dollar depends on the application you are running, so what I am suggesting more precisely is that you compute some perf/$ metric for every benchmark you run. And even if the CPU price is less negligible compared to the rest of the server components, it is always interesting to look both at absolute perf and perf/$ rather than just absolute perf.

    Reply
  • denka - Wednesday, December 24, 2008 - link

    32-bit? 1.5Gb SGA? This is really ridiculous. Your tests should be bottlenecked by IO Reply
  • JohanAnandtech - Wednesday, December 24, 2008 - link

    I forgot to mention that the database created is slightly larger than 1 GB. And we wouldn't be able to get >80% CPU load if we were bottlenecked by I/O Reply
  • denka - Wednesday, December 24, 2008 - link

    You are right, this is a smallish database. By the way, when you report CPU utilization, would you take IOWait separate from CPU used? If taken together (which was not clear) it is possible to get 100% CPU utilization out of which 90% will be IOWait :) Reply
  • denka - Wednesday, December 24, 2008 - link

    Not to be negative: excellent article, by the way Reply
  • mkruer - Tuesday, December 23, 2008 - link

    If/When AMD does release the Istanbul (k10.5 6-core), The Nehalem will again be relegated to second place for most HPC. Reply
  • Exar3342 - Wednesday, December 24, 2008 - link

    Yeah, by that time we will have 8-core Sandy Bridge 32nm chips from Intel... Reply
  • Amiga500 - Tuesday, December 23, 2008 - link

    I guess the key battleground will be Shanghai versus Nehalem in the virtualised server space...

    AMD need their optimisations to shine through.


    Its entirely understandable that you could not conduct virtualisation tests on the Nehalem platform, but unfortunate from the point of view that it may decide whether Shanghai is a success or failure over its life as a whole. As always, time is the great enemy! :-)
    Reply
  • JohanAnandtech - Tuesday, December 23, 2008 - link

    "you could not conduct virtualisation tests on the Nehalem platform"

    Yes. At the moment we have only 3 GB of DDR-3 1066. So that would make pretty poor Virtualization benches indeed.

    "unfortunate from the point of view that it may decide whether Shanghai is a success or failure"

    Personally, I think this might still be one of Shanghai strong points. Virtualization is about memory bandwidth, cache size and TLBs. Shanghai can't beat Nehalem's BW, but when it comes to TLB size it can make up a bit.
    Reply
  • VooDooAddict - Tuesday, December 23, 2008 - link

    With the VMWare benchmark, it is really just a measure of the CPU / Memory. Unless you are running applications with very small datasets where everything fits into RAM, the primary bottlenck I've run into is the storage system. I find it much better to focus your hardware funds on the storage system and use the company standard hardware for server platform.

    This isn't to say the bench isn't useful. Just wanted to let people know not to base your VMWare buildout soley on those numbers.
    Reply

Log in

Don't have an account? Sign up now