BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Every virtual machine must be provisioned with the processors, memory and storage that a VM instance and its workload require for normal operation. Resource demands vary depending on the workload and its activity level, so provisioning a VM with the right amount of computing resources can be a tricky proposition. Give the VM too few resources and the workload may underperform, become unstable, or even crash outright. Give it too many resources and the excess computing potential may be wasted, costing the business money. IT administrators must be experts in detecting overprovisioned VMs, and take the right steps to adjust resource use.
Why overprovisioning happens
There are plenty of times when admins initially get provisioning wrong. The workload -- starved for processing or memory -- founders, and it clearly needs more resources to get the job done. But there's a difference between supplying an adequate level of resources and throwing additional resources at a workload that doesn't need them.
Overprovisioning often occurs because admins simply don't know what resources are necessary and appropriate for a given VM, especially as its load conditions change. Sometimes admins recognize that the VM's performance may be impaired by inadvertent resource starvation. The natural reaction is to overcompensate to guarantee that the problem is rectified, and also to prevent the problem from haunting admins again.
This knee-jerk response is a poor practice. It suggests a general lack of application understanding, planning and testing. Proper testing helps establish resource levels or boundaries before the workload is deployed in production. Some IT workers also cling to the false notion that more resources equates to better workload performance.
"Customers and IT professionals might feel that adding processing power and memory will improve a VM's performance," said Scott Gorcester, CEO of VirtualQube, a cloud services provider. "But testing and proper analysis of VM and application performance will show that in some cases there is a sweet spot where the systems run best and adding more resources either has no effect, or even has a negative effect."
For example, allocating additional vCPUs might seem free, but some software licenses are affected by processor counts, and adding processor power can trigger unexpected license fees that inadvertently raise the operating cost of the VM. Adding memory to a VM can lower the total number of VMs that a server can support. This limits workload consolidation initiatives, impairs workload balancing schemes and leads a business to buy more servers or storage than required. This, in turn, drives higher costs for maintenance and energy, and creates cooling concerns.
Finding overprovisioned VMs
It's impossible to fix an overprovisioned VM without determining whether the VM is overprovisioned in the first place. Several tools can help make that determination. IT experts rely on remote monitoring and management tools, like Kaseya VSA and SolarWinds Virtualization Manager, to alert staff when a VM resource needs a change. Others choose tools that align with the hypervisor vendor.
"We primarily use vRealize for insight into overprovisioned and underprovisioned systems," said Aldo Cabrera, network engineer and release manager at W.P. Carey, a real estate investment firm. "We also have monitoring tools that give us immediate insight into disk, network and RAM use through (Simple Network Management Protocol) SNMP and script triggers." Even the hypervisor platform can include performance counters and monitoring features -- such as vSphere's performance charts, host health dashboard, reporting and alerting -- and other tools such as VMware's esxtop command-line product.
So what factors actually suggest an overprovisioned VM?
Consider processor or CPU usage first. Temporary spikes in processor usage are normal, but consistently high processor usage -- perhaps more than 90% -- suggests overprovisioning. Remember that it's easy to create more vCPUs and allocate them to VMs, but every vCPU has to be scheduled and wait for a physical CPU in order to process instructions and data for the VMs. This leads to high ready time -- usually over 10% to 20% -- where vCPUs queue up and wait for physical processor resources. This kind of overprovisioning can degrade VM performance on a server.
Compare the performance reports of a troubled VM against other VMs on the same server and across other servers. The root cause of the problems is often too many vCPUs, too many VMs or a poorly configured CPU limit setting on the troubled VM. Reduce the number of unnecessary vCPUs allocated to the VM. For example, allocating two or four vCPUs to a single-threaded VM wastes vCPUs because the single-threaded application can use only one vCPU. Increasing the CPU shares priority or setting CPU reservations for the VM can help by giving the vCPUs more access to the physical CPUs. Workload balancing -- migrating the troubled VM to another server with more free resources -- can also reduce the number of vCPUs running on the server.
Provisioning problems also frequently extend to memory allocation. Each VM should have slightly more memory than is required for the VM and its application, and the server needs more total memory than the combined memory used by all resident VMs. Giving a VM more memory than it and its application need offers no benefit. Check each VM's memory usage and free memory values along with active and granted memory size reports.
Memory usage that is always too high -- 95% -- or free memory that is always too low -- 5% or less -- suggests memory underprovisioning. Active memory is often equal to granted memory, so there isn't enough free memory. This impacts VM performance by excess disk swapping, and the hypervisor may use aggressive memory reclamation techniques, such as memory ballooning, to recover and reuse idle memory. Conversely, memory usage that is too low and free memory that is too high is overprovisioned; memory allocated to the VM can typically be reduced to free resources for other VMs. In some cases, reducing unnecessarily high memory reservation settings can also free that excess memory for reclamation and reuse.
Storage capacity rarely impacts performance directly, but it's still wise to review the logical unit number (LUN) volumes assigned to VMs and monitor how that capacity is used. Allocating a large LUN to a VM that won't use it can be a waste of expensive storage capacity. Thin provisioning can help reduce costly storage waste because the actual physical disk capacity installed may be only a small fraction of the logical volume size that was specified. For example, it's much cheaper to thin provision a 100 GB LUN with only 10 GB allocated, and then add physical memory later as the physical volume fills.
It's important to monitor capacity and add more physical disk space before allocated capacity runs out. Also, watch disk performance factors, such as latency, to ensure that storage performance issues don't impact VM performance.
Right-sizing resource allocation
The best way to prevent VM overprovisioning is to allocate the optimum level of resources to each VM from the start, but right-sizing a VM can be tricky. Many organizations base preliminary allocation decisions on detailed conversations with the application vendors, assuming that the vendor has the best knowledge of the application and its requirements.
Still, it may not be wise to take a vendor's suggestions as the final word. "A client informed us that their software vendor highly recommended 48 cores and 128 GB of RAM for a VM," Gorcester said. "After tuning the system, we settled at four CPU cores and 24 GB of RAM for the best performance. The user experience declined when we went above four cores, and we simply had no need for more than 24 GB of RAM."
IT staff expertise along with performance monitoring, testing and tuning are also vital for establishing the best resource provisioning and best price-to-performance ratio. Start conservatively by allocating the minimum resources determined for the workload. Watch the performance and expect occasional dips as workload demands change. That's perfectly normal. Inadequate and strained resources can easily be tweaked in small increments. "It's better to underprovision or stick to the minimum suggestions and review it later," Cabrera said. "Reducing resources may not be wanted by those owning the services on the server. Right-sizing and adding more resources will always be welcomed, but taking resources away is [politically] harder."
It's also helpful to leverage any resource recovery and workload balancing tools. For example, implement the hypervisor's resource recovery features like dynamic memory, memory ballooning and transparent page sharing. VMware's distributed resource scheduler and similar tools help to orchestrate VM migrations and make best use of each server's available resources. This optimizes the total number of VMs that the environment can support while providing the best performance for those workloads.
Not all overprovisioning is necessarily bad. A judicious dose of additional resources can smooth a workload's performance, improve the user experience and keep the application's stakeholders happy. But simply throwing more resources at a workload can waste capacity, cost money and even threaten the performance you're trying to improve. It takes the right tools and objective evaluation of the data to make sound decisions about resource usage.
Why overprovisioning isn't a sound strategy
Finding the right data store size for your environment
Balance memory overcommitment to reduce risk