Real-world virtualization benchmarking: the best server CPUs compared

Name: Real-world virtualization benchmarking: the best server CPUs compared
Item: Real-world virtualization benchmarking: the best server CPUs compared
Author: Johan De Gelas

by Johan De Gelas on May 21, 2009 3:00 AM EST

Posted in
IT Computing

66 Comments | Add A Comment

66 Comments

vApus Mark I vs. VMmark

By now, it should be clear that vApus Mark I is not meant to replace VMmark or VConsolidate. The largest difference is that VMmark for example tries to mimic the "average" virtualized datacenter, while vApus Mark I clearly focuses on the heavier service oriented applications. vApus Mark I focuses on a smaller part of the market, while the creators of VMmark have invested a lot of thought into getting a nice mix of the typical applications found in a datacenter. We have listed the most important differences below.

vApus Mark I compared to VMmark
	vApus Mark I	VMmark
Goal	Virtualization benchmarking across Guest OS, Hypervisor, and Hardware	Measuring what the best hardware is for ESX
Reproducible by third parties	No; for now it's only available to AnandTech and Sizing Server Lab	Yes
Modeling	"Harder to virtualize", "heavy duty" applications	A balanced mix of virtualized applications in the "typical" datacenter
VMs	Large "heavy duty" VMs; 4GB with 4 VCPUs	Small VMs 0.5-2GB, 1-2 VCPUs
Market coverage	Small but important part of the market	Large part of datacenter market
Relevance to the real-world	Uses real-world applications	Uses industry standard benchmarks

The advantages of vApus Mark I are the fact that we use real-world applications and test them as if they are loaded by real people. The advantages of VMmark are that it is available to everyone and it has a mix of applications that is closer to what is found in the majority of datacenters. vApus Mark I focuses more on heavy duty applications.

There's one small difference between the existing benchmarks like VMmark and VConsolidate and our "vApus Mark I" virtual test. VMmark and VConsolidate add additional groups of VMs (called tiles or CSUs) until the benchmark score does not increase anymore, or until all the system processors are fully utilized. Our virtualization benchmark tries to get close to 100% CPU load much quicker. This is a result of the fact that our VMs require relatively large amounts of memory: each VM needs 4GB. If we used a throttled load such as VMmark or VConsolidate, we would require massive amounts of memory to measure servers with 16 cores and more. Six VMs that make up a tile in VMmark take only 5GB, while our four VMs require 16GB. Our current monitoring shows that this benchmark could run in 10-11GB, and thanks to VMware's shared memory technique probably less than 9GB. With four VMs we can test up to 12 physical CPUs, or 16 logical CPUs (8 Physical + 8 SMT). We need eight VMs (or two "tiles") to fully stress 16 to 24 physical cores.

vApus: Virtual Stress Testing Benchmarked Hardware Configurations

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

66 Comments

View All Comments

GotDiesel - Thursday, May 21, 2009 - link
"Yes, this article is long overdue, but the Sizing Server Lab proudly presents the AnandTech readers with our newest virtualization benchmark, vApus Mark I, which uses real-world applications in a Windows Server Consolidation scenario."

spoken with a mouth full of microsoft cock

where are the Linux reviews ?

not all of us VM with windows you know..
JohanAnandtech - Thursday, May 21, 2009 - link
A minimum form of politeness would be appreciated, but I am going to assume your were just dissapointed.

The problem is that right now the calling circle benchmark runs half as fast on Linux as it does on Windows. What is causing Oracle to run slower on Linux than on Windows is a mystery even to some of the experienced DBA we have spoken. We either have to replace that benchmark with an alternative (probably Sysbench) or find out what exactly happened.

When you construct a virtualized benchmark it is not enough just to throw in a few benchmarks and VMs, you really have to understand the benchmark thoroughly. There are enough halfbaken benchmarks already on the internet that look like a Swiss cheese because there are so many holes in the methodology.
JarredWalton - Thursday, May 21, 2009 - link
Page 4: vApus Mark I: the choices we made

"vApus mark I uses only Windows Guest OS VMs, but we are also preparing a mixed Linux and Windows scenario."

Building tests, verifying tests, running them on all the servers takes a lot of time. That's why the 2-tile and 3-tile results are not yet ready. I suppose Linux will have to wait for Mark II (or Mark I.1).
mino - Thursday, May 21, 2009 - link
What you did so far is great. No more words needed.

What I would like to see is vApus Mark I "small" where you make the tiles smaller, about 1/3 to 1/4 of your current tiles.
Tile structure shall remain simmilar for simplicity, they will just be smaller.

When you manage to have 2 different tile sizes, you shall be able to consider 1 big + 1 small tile as one "condensed" tile for general score.

Having 2 reference points will allow for evaluating "VM size scaling" situations.
JohanAnandtech - Sunday, May 24, 2009 - link
Can you elaborate a bit? What do you menan by "1/3 of my current tile?" . A tile = 4 VMs. are you talking about small mem footprint or number of VCPUs?

Are you saying we should test with a Tile with small VMs and then test afterwards with the large ones? How do you see such "VM scaling" evaluation?
mino - Monday, May 25, 2009 - link
Thanks for response.

1/3 I mean smaller VM's. Mostly from the load POW. Probably 1/3 load would go for 1/2 memory footprint.

The point being that currently the is only a single datapont with a specific load-size per tile/per VM.

By "VM scaling" I would like to see what effect woul smaller loads have on overal performance.

I suggest 1/3 or 1/4 the load to get a measurable difference while remaining within reasonable memory/VM scale.

In the end, if you get simmilar overal performance from 1/4 tiles, it may not make sense to include this in future.
Even then the information that your benchmark results can be safely extrapolated to smaller loads would be of a great value by itself.
mino - Monday, May 25, 2009 - link
Eh, that last text of mime looks like a nice gibberish...
Clarification nneded:

To be able to run more tiles/box smaller memory footprint is a must.
With smaller mem footprint, smaller DB's are a must.

The end results may not be directly comparable but shall be able to give some reference point, corectly interpreted

Please let me know if this makes sense to you.
There are multiple dimensions to this. I may be easily on the imaginery branch :)
ibb27 - Thursday, May 21, 2009 - link
Can we have a chance to see benchmarks for Sun Virtualbox which is Opensource?
winterspan - Tuesday, May 26, 2009 - link
This test is misleading because you are not using the latest version of VMware that supports Intel's EPT. Since AMD's version of this is supported in the older version, the test is not at all a fair representation of their respective performance.
Zstream - Thursday, May 21, 2009 - link
Can someone please perform a Win2008 RC2 Terminal Server benchmark? I have been looking everywhere and no one can provide that.

If I can take this benchmark and tell my boss this is how the servers will perform in a TS environment please let me know.

Real-world virtualization benchmarking: the best server CPUs compared

Post Your Comment

66 Comments

View All Comments

GotDiesel - Thursday, May 21, 2009 - link

JohanAnandtech - Thursday, May 21, 2009 - link

JarredWalton - Thursday, May 21, 2009 - link

mino - Thursday, May 21, 2009 - link

JohanAnandtech - Sunday, May 24, 2009 - link

mino - Monday, May 25, 2009 - link

mino - Monday, May 25, 2009 - link

ibb27 - Thursday, May 21, 2009 - link

winterspan - Tuesday, May 26, 2009 - link

Zstream - Thursday, May 21, 2009 - link

Log in

Don't have an account? Sign up now