Rawpixel - Fotolia


Watch CPU Ready metrics to regulate VM overcommit levels

Administrators can get more out of their servers by overcommitting CPU resources and balancing VM distribution.

One of the great things about virtualization is the ability to abstract and share resources. This abstraction also enables the overcommitment of resources -- a practice that describes provisioning more logical resources than is physically available. Not all servers within a data center require full access to all available resources at the same time, and overcommitment allows organizations to better use these resources and increase server consolidation.

However, the level of overcommitment of these resources is not a hard line in today's ever-changing environments. Deciding how much you can overcommit is a balancing act between changing workload demands and consolidation goals. Overcommitment levels are not a static value, as most businesses and workload demands are not static. Rather, it should be a very fluid value that depends on a wide range of variables, including the type of applications and even the time of day. In this article, we'll look at CPU overcommitment and examine a few key indicators that can help you manage that fluid line.

Calculating consolidation

You can find various calculators showing different values of VMs per core. Numbers can range from three VMs per CPU core to six or seven. There are a couple of issues with this type of formula if you don't look at it closely. First, by the raw math, a dual-CPU rack server with six physical cores per socket with hyper-threading enabled gives you 24 countable cores for that box. The calculator-based formula can give you an estimate of four VMs per core for a total of 96 VMs per rackmount server -- four VMs per core times 24 cores per rackmount server. It's unlikely you'll see that level of consolidation for a few reasons.

One of the first questions to address is what kind of workload you will be running. In a heavy transaction-based environment, you may find yourself with multiple VMs requiring multiple cores. If you're running most of your VMs with dual or quad-core based virtual CPUs (vCPUs) you have to take that into account. Your initial calculation of 96 VMs is now reduced to 24 VMs by using quad-core based VMs with four vCPUs, but unfortunately that is not the end of formula. Four VMs to one core assumes a 0% to 50% usage range, with an average of 25% -- four VMs per core. However, what if your average is closer to 50% usage due to application demands? This moves your VM per core average from four to two, and the estimate from the original 96 VMs per rackmount server to 12 VMs.

CPU Ready Time metrics

That is a pretty large range to cover. Which calculation is correct; 12 VMs or 96 VMs? The answer is both. Overcommitment is a fluid line, and math can only give you a range to work with. The real key here is monitoring -- specifically we should be concerned with CPU Ready Time. This value is your best friend when it comes to working with CPU overcommitment. The CPU Ready value is the amount of time your VMs are ready to use the CPU but are unable to because the CPU resources are being used elsewhere -- by another VM, for example. If your VMs have less than 2% to 3% CPU Busy Time, your VMs are having ready access to CPU resources. If that value increases to 5% or higher, your VMs are starting to become CPU bound. Once you hit double digits -- 10% or higher -- in your CPU Ready Time, you're in real trouble. At that point you will need to reduce the load on your host or end up with significant performance problems.

With possible contention and rising CPU Ready Times comes the opportunity to make use of shares and resource pools. The important thing to remember is that this overcommitment -- made possible by shares and resource pools -- will not give you additional free resources. Yes, you can increase the resources available to some VMs, but it comes at a cost to other VMs. The shares and resource pools are designed to help when you have contention -- though remember that overcommitment, by its nature, makes resource contention possible.

This article is the first in a three-part series about overcommit technologies and tips for managing resources in a virtualized data center. Read part two of this series for strategies to balance memory overcommit and mitigate risk. Read part three for advice on how to cut waste with storage overcommitment.

Not all VMs are created equal. As you build your virtual infrastructure, plan for some overcommitment of CPU resources by balancing CPU-intense VMs with lower-demand VMs, as well as critical VMs with low-priority VMs. By setting the share value for your VMs or taking advantage of resource pools, you can allow your production servers to pull resources from lower priority VMs if needed during times of contention. If you don't mix VMs on the same server, then you cannot squeeze additional resources in times of need because every workload will be critical. If you cannot afford to overcommit, there's a good chance you're not using your resources effectively. Overcommitment is safe, just use CPU Ready Times as your guide and take care to place VMs so that you can pull additional resources if needed.

Next Steps

Finding the right level of server consolidation

Monitor vCPU stats to improve VM performance

Set alarms to help you monitor vSphere performance

Dig Deeper on Virtual machine provisioning and configuration