Limitations to Keep in Mind When Virtualizing

Keeping the Monitor and VMkernel's tasks in mind, which aspects of your application are the most easily affected by virtualization?

As it stands, VMware sees about an 80/19/1 division in application performance profiles. To put this more clearly, they estimate that 80% of all applications will notice very little to no performance impact from being virtualized, 19% will experience a noticeable performance hit, while 1% is deemed completely unfit for virtualization.

These numbers don't say a whole lot for your application, though, so here is a listing of parameters VMware looks for:

CPU usage. Just by looking at your task manager (or whichever monitoring tool you prefer), you can make a reasonable guess at what will happen when you virtualize. If you're seeing generally high CPU usage with very little kernel time, and you're generally not too worried about I/O, your application has a good chance of running fine in a virtual environment. Why is that?

User-mode CPU instructions can generally be run without any additional overhead. The Monitor is able to simply let these pass through to the CPU scheduler, because there's no danger in these instructions causing a mess for the other VMs. The dangerous instructions are the ones that require kernel mode to kick in, usually because the Operating System in question is not aware of its virtualized state, and could prevent the other VMs from functioning properly. Different solutions have been developed to work around this, but it's important to realize which type is used in which situation.

Luckily, for ESX 4 Scott Drummonds provided a nice overview of which combinations are used in every situation, allowing us to check and make sure our VMs are using the best our hardware has to offer.

AMD Virtualization Tech Use Summary
Configuration Barcelona, Phenom and Newer AMD64 Pre-Barcelona No AMD64
Fault Tolerance Enabled AMD-V + SPT Won't run Won't run
64-bit guests AMD-V + RVI BT + SPT Won't run
VMI Enabled BT + SPT BT + SPT BT + SPT
OpenServer, UnixWare, OS/2 AMD-V + RVI BT + SPT BT + SPT
32-bit Linux and 32-bit FreeBSD AMD-V + RVI BT + SPT BT + SPT
32-bit Windows XP, Windows Vista, Windows Server 2003, Windows Server 2008 AMD-V + RVI BT + SPT BT + SPT
Windows 2000, Windows NT, DOS, Windows 95, Windows 98, Netware, 32-bit Solaris BT + SPT BT + SPT BT + SPT
32-bit guests AMD-V + RVI BT + SPT BT + SPT

Intel Virtualization Tech Use Summary
Configuration Core i7 45nm Core 2 VT-x 65nm Core 2 VT-x Flex Priority 65nm Core 2 VT-x No Flex Priority P4 VT-x EM64T
No VT-x
No EM64T
Fault Tolerance Enabled VT-x + SPT VT-x + SPT VT-x + SPT VT-x + SPT Won't run Won't run Won't run
64-bit guests VT-x + EPT VT-x + SPT VT-x + SPT VT-x + SPT VT-x + SPT Won't run Won't run
VMI Enabled BT + SPT BT + SPT BT + SPT BT + SPT BT + SPT BT + SPT BT + SPT
OpenServer, UnixWare, OS/2 VT-x + EPT VT-x + SPT VT-x + SPT VT-x + SPT VT-x + SPT BT + SPT BT + SPT
32-bit Linux and 32-bit FreeBSD VT-x + EPT VT-x + SPT BT + SPT (*) BT + SPT (*) BT + SPT (*) BT + SPT BT + SPT
32-bit Windows XP, Windows Vista, Windows Server 2003, Windows Server 2008 VT-x + EPT VT-x + SPT VT-x + SPT BT + SPT (*) BT + SPT (*) BT + SPT BT + SPT
Windows 2000, Windows NT, DOS, Windows 95, Windows 98, Netware, 32-bit Solaris BT + SPT (*) BT + SPT (*) BT + SPT (*) BT + SPT (*) BT + SPT (*) BT + SPT BT + SPT
32-bit guests VT-x + EPT VT-x + SPT VT-x + SPT VT-x + SPT VT-x + SPT BT + SPT BT + SPT

(*) denotes the capability of dynamically switching to VT-x mode when the guest requires it.

For more information on Shadow Page Tables (SPT), Extended Page Tables (EPT), RVI (Rapid Virtualization Indexing), Intel Hardware Virtualization (Intel VT-x) and AMD Hardware Virtualization (AMD-V), refer to our previous articles on these subjects.

Network bandwidth: To VMware, bandwidth is the biggest reason why an application could prove troublesome when virtualized. The theoretical amount of throughput ESX can maintain is claimed to be about 16Gbps. Adding in a big dash of realism, VMware places applications with a requirement for over 1Gbps in the "troublesome" category: Virtualization is possible, but they don't rule out that there might be a performance hit. What we have noticed ourselves here is a great amount of CPU overhead on the side of the initiator when teaming up multiple NICs to increase bandwidth. Teaming up to three Gigabit switches together causes a CPU load of about 40% inside a Windows Server 2003 VM running on four vCPUs of a 2.93GHz Nehalem processor. This overhead is greatly reduced when updating to Windows 2008 (to about 15%), and is further reduced to less than 5% when using ESX's own iSCSI initiator. Much thanks to Tijl Deneut of the Sizing Servers Lab for this tip.

General I/O: This is the one thing VMware will admit to causing a few applications to be simply "unvirtualizable". ESX at this point can maintain roughly 100,000 IOPS (vSphere is rumored to push this up to 350,000) and a maximum of 600 disks, which is a relatively impressive number considering a hefty Exchange installation requires something to the tune of 5000 IOPS. Counting at roughly 0.3 IOPS per user, that would account for over 16000 users, though of course this number might vary depending on the setup and user profiles. However, there are applications that simply require more IOPS, and they'll experience quite a lot of trouble under ESX. Moreover, changes in the storage layout might turn up some unexpected results, but we'll have more on that in the second article for this series.

Memory requirements: Finally, for very large database systems, there might be an issue here. In ESX 3.5, it is only possible to allocate as much as 64GB of RAM to a single VM. While plenty for most deployments, this might make virtualization of large-scale database systems troublesome. Should this be the case, remember that ESX 4.0 ups this limit to 255GB per VM, allowing even the most memory-hungry databases an impressive amount of leeway.

Keeping these things in mind, however, there are some ways to get the most of your virtualized environment….

A Quick Overview of ESX Things that Can Improve General Performance
Comments Locked

10 Comments

View All Comments

  • najames - Tuesday, June 23, 2009 - link

    This is perfect timing for stuff going on at work. I'd like to see part 2 of this article.
  • - Wednesday, June 17, 2009 - link

    I am very curious how vmware effects timings during the logging of streaming data. Is there a chance that some light could be shed on this topic?

    I would like to use vmware to create a clean platform in which to collect data within. I am, however, very skeptical about how this is going to change the processing of the data (especially in regards to timings).

    Thanks for any help in advance.
  • KMaCjapan - Wednesday, June 17, 2009 - link

    Hello. First off I wanted to say I enjoyed this write up. For those out there looking for further information on this subject VMware recently released approximately 30 sessions from VMworld 2008 and VMworld Europe 2009 to the public free of charge, you just need to sign up for an account to access the information.

    The following website lists all of the available sessions

    http://vsphere-land.com/news/select-vmworld-sessio...

    and the next site is the direct link to VMworld 2008 ESX Server Best Practices and Performance. It is approximately a 1 hour session.

    http://www.vmworld.com/docs/DOC-2380">http://www.vmworld.com/docs/DOC-2380

    Enjoy.

    Cheers
    K-MaC
  • yknott - Tuesday, June 16, 2009 - link

    Great writeup Liz. There's one more major setup issue that I've run into numerous times during my ESX installations.

    It has to do with IRQ sharing causing numerous interrupts on CPU0. Basically, ESX handles all interrupts (network, storage etc) on CPU0 instead of spreading them out to all CPUS. If there is IRQ sharing, this can peg CPU0 and cause major performance issues. I've seen 20% performance degradation due to this issue. For me, the way to solve this has been to disable the usb-uhci driver in the Console OS.

    You can find out more about this issue here:http://www.tuxyturvy.com/blog/index.php?/archives/...">http://www.tuxyturvy.com/blog/index.php...ng-VMwar...

    and http://kb.vmware.com/selfservice/microsites/search...">http://kb.vmware.com/selfservice/micros...&cmd...

    This may not be an issue on "homebuilt" servers, but it's definitely cropped up for me on all HP servers and a number of IBM x series servers as well.
  • LizVD - Wednesday, June 17, 2009 - link

    Thanks for that tip, yknott, I'll look into including that in the article after researching it a bit more!
  • badnews - Tuesday, June 16, 2009 - link

    Nice article, but can we get some open-source love too? :-)

    For instance, I would love to see an article that compares the performance of say ESX vs open-source technologies like Xen, KVM! Also, how about para-virtualised guests. If you are targeting performance (as I think most AT readers are) I would be interested what sort of platforms are best placed to handle them.

    And how about some I/O comparisons? Alright the CPU makes a difference, but how about RAID-10 SATA/SAS vs RAID-1 SSD on multiple VMs?
  • LizVD - Wednesday, June 17, 2009 - link

    We are actually working on a completely Open-Source version of our vApus Mark bench, and to give it a proper testdrive, we're using it to compare OpenVZ and Xen performance, which my next article will be about (after part 2 of this one comes out).

    I realize we've been "neglecting" the open source side of the story a bit, so that is the first thing I am looking into now. Hopefully I can include KVM in that equation as well.

    Thanks alot for your feedback!
  • Gasaraki88 - Tuesday, June 16, 2009 - link

    Thanks for this article. As an ESX admin, this is very informative.
  • mlambert - Tuesday, June 16, 2009 - link

    We must note here that we've found a frame size of 4000 to be optimal for iSCSI, because this allows the blocks to be sent through without being spread of separate frames.

    Can you post testing & results for this? Also would be interesting to know if 9000 was optimal for NFS datastores (as NFS is where most smart shops are using anyways...).
  • Lord 666 - Tuesday, June 16, 2009 - link

    Excellent write up.

Log in

Don't have an account? Sign up now