Avoiding CPU contention and bottlenecks

Bottlenecks can leave your VM twiddling its thumbs. Luckily, there are several techniques to prevent CPU contention and maximize performance.

In some environments CPU bottlenecks can happen. In these situations, the memory payload of the guest OS and application...

is small, but the service is carrying out a high volume of transactions per second. Before long, this type of memory traffic jam can result directly in CPU contention.

Despite all the fancy footwork that modern hypervisors bring to the table, it is still the case that one vCPU VM can send threads to only a single CPU core. In this respect, raw performance is the number of cycles per second that core can provide. When this core becomes congested, CPU contention will diminish the overall performance of your VM.

Bear in mind that with virtualization it is still unlikely that a VM with one vCPU would gain exclusive access to a core. It’s more likely that the VM will have to share the CPU with other VMs.

If this sharing is allowed to continue unchecked, it is possible that the CPU could become saturated with requests. In this scenario, CPU contention also takes place. Fundamentally, if a VM needs more CPU cycles than the core can provide, the only way to deliver CPU cycles greater than 100% is to configure the VM with two or more vCPUs to deliver true symmetric multi-processing.

Before you do that, though, you need to decide if the CPU is the constraining resource and confirm that the application within the guest operating system is multi-threaded. There’s little point in giving the VM two vCPUs if threads are executed only on CPU0 while CPU1 is still there twiddling its thumbs.

As time goes on, you might find yourself running on the more modern Intel Nehalem architecture. Studies have shown that a single Intel Nethalem core can actually outperform a SMP-enabled system using the older CPU types. To identify CPU bottlenecks, investigate the following areas:

  • Using your hypervisor, identify if any VMs are using 100% of the CPU. Avoid using the guest operating system tools in the VM, and look for performance data delivered by the hypervisor -- it will be more accurate.
  • Look for high %Ready values in VMware ESX because this is an indication that your VM would like more CPU cycles but isn’t receiving them.
  • Look for high co-stop values because this can show excessive use of SMP in your VMs.

That last tip needs some explanation. Sometimes administrators go overboard in giving every VM more than one vCPU as a standard—even when it’s not entirely necessary. If you use virtual SMP excessively, you can give the hypervisor more work than it’s expecting. It has to work harder to schedule multiple vCPUs across multiple cores inside the CPU socket.

What can happen is this situation can actually increase the very CPU contention you were trying to avoid in the first place. So strictly control the use of virtual SMP, especially on the modern CPU architectures where its benefits may be limited.

Network Bottlenecks
It’s a common misconception that with many VMs sharing the same physical networks, network bandwidth would become scarce. As with CPU resources, most networks work in a non-linear way so that many systems can co-exist on the same network without treading on each other’s toes.

In most environments you will struggle to see VMs saturate even a 1Gps interface. It’s common that these interfaces are teamed together for fault-tolerance and load balancing. So if it’s unlikely in a 1Gps environment, it’s even less so when you have bundles of network interface cards.

The reality is that your network bottlenecks are more likely to be seen during the process commonly called live migration in which VMs with large memory allocations are moved from one physical server to another. Again, right-sizing your VM’s memory allocation is critical in reducing the performance hit that live migration brings and avoiding bottlenecks. So it’s important to have dedicated gigabit-and-above network interfaces dedicated to this ancillary process.

Storage Bottlenecks
Storage oversubscription is usually caused by simple administrative errors. What’s surprising are the IO demands that virtual desktop projects can sometimes impose at the storage layer. It’s not uncommon to have storms of storage and CPU IO caused by “boot storms” and anti-virus scanning. Storage vendors can help by allowing customers to purchase caching modules that add solid-state storage to elevate the IO chokepoints in VDI. 

It’s not uncommon to have storms of storage and CPU IO caused by “boot storms” and anti-virus scanning. Storage vendors can help by allowing customers to purchase caching modules that add solid-state storage to elevate the IO chokepoints in VDI. 

Adequate planning needs to occur at an early stage so the costs of scaling up a VDI solution are exposed early. The tipping point for the use of such caching technologies appears to be 500 to 600 VMs.

Below this point, simply distributing the virtual desktop around various arrays, LUNs and spindles appears to be enough. Beyond the 500 to 600 VM range, businesses should seriously begin considering solid-state solutions as a way of taking the disk spindle out of the equation.

This caching approach can also elevate some of the IOPS generated by routine VDI tasks such as creating and destroying virtual desktops as users log out or join the system.

For server-based VMs that are disk IO bound, many simple tasks can be used to boost performance. Although it’s perfectly fine for 10 or 20 VMs to occupy the same volume/LUN, it is reasonable for disk IO bound VMs to dedicate a volume/LUN to specific application or merely reduce the ratio of VMs to a datastore to increase the available disk IOPS shared among the competing VMs and shorten the disk queues to storage. Other optimization techniques include the following:

  • When possible, use the hypervisors paravirtualized SCSI controller inside the VM.
  • Distribute the virtual disks of the VM across multiple volume/LUNs to ensure virtual disks do not compete against each other for IO.
  • Adopt permissions on datastores to prevent rogue administrators from creating VMs in the wrong location by sorting storage according to the amount of free space without considering the IOPS needs of the VM.

Although it‘s true that higher consolidation ratios increase the potential for saturation of core resources, it’s not inevitable. Today’s hardware and software are keeping up with these increased resource demands.

In the world of virtualization, memory continues to be key constraint. But there is plenty an administrator can do to tweak performance and avoid CPU contention before it gets out of hand.

Eventually the constraint on consolidation ratios may be more about businesses feeling anxious about putting too many eggs in one basket. In this respect it could be that an availability gap is opening. Although hardware and hypervisors increase resource capabilities, the risks of very high consolidation ratios remain despite the widespread use of hypervisor clustering, fault tolerance and in-guest service protection tools.

Mike Laverick is a professional instructor with 17 years experience in technologies such as Novell, Windows and Citrix. Involved with the VMware community since 2003, Laverick is a VMware forum moderator and member of the London VMware User Group Steering Committee. He is also the owner and author of the virtualization website and blog RTFM Education, where he publishes free guides and utilities aimed at VMware ESX/Virtual Center users.

Dig Deeper on Server consolidation and improved resource utilization