How server load-balancing services work: VMware DRS, Microsoft SCVMM

If you know how your load-balancing service calculates the ins and outs of virtual server load balancing, you can tune clusters to better meet your organization's needs...

Virtual server load balancing among cluster hosts is all about the math. An automated server load-balancing service...

calculates resource utilization, then compares one host's available capacity with that of other hosts to determine whether a cluster needs rebalancing.

But it's not an exact science. Various load-balancing services use different calculation models to determine whether a cluster is balanced. VMware vSphere's Distributed Resource Scheduler (DRS) feature, for example, uses different metrics than does Microsoft System Center Virtual Machine Manager's Performance and Resource Optimization (PRO) feature. Ultimately, however, admins need a combination of performance monitoring and calculations before they live-migrate a virtual machine (VM) for load balancing.

Most of us leave cluster load balancing to an automated load-balancing service, but it's important to understand the calculations that service uses. Understanding these metrics indicates when a load-balancing service should be tuned for better results. Plus, you're better able to recognize when a vendor's server load-balancing offering isn't true load balancing.

VMware's load-balancing service: Distributed Resource Scheduler

The VMware DRS load-balancing service uses two metrics to determine whether a cluster is out of balance. When a host's current host load standard deviation number is greater than the target host load standard deviation, DRS recognizes that the host is unbalanced with the rest of the cluster. To rebalance the cluster, DRS usually uses vMotion to migrate VMs off an overloaded host.

These server load-balancing metrics reside in the VMware DRS pane inside the vSphere Client. DRS gathers its values by analyzing each host's CPU and memory resources to determine a load level. Then, the load-balancing service determines an average load level and standard deviation from that average. As long as vSphere is operational, DRS re-evaluates its cluster load every five minutes to check for balance.

If the load-balancing service determines that rebalancing is necessary, DRS prioritizes which virtual machines need to be rebalanced across a cluster. Using the following equation, the service calculates a host's balance compared with other hosts in the cluster:

Figure 1
This equation determines cluster load balancing. (Click image for an enlarged view.)

A perfectly balanced cluster reports a zero for its current host load standard deviation. That means the host is balanced with the others in the cluster. If that number increases, it means the VMs on one server require additional resources than the average and that the total resources on the host are unbalanced from the levels on other hosts.

DRS then makes prioritized recommendations to restore balance. Priority-one recommendations should be implemented immediately, while priority-five recommendations won't do much to fix the imbalance.

Server load balancing with Microsoft's Performance and Resource Optimization

Microsoft's System Center Virtual Machine Manager (SCVMM) takes a different approach to cluster load balancing. Natively, it doesn't take into account aggregate cluster conditions when calculating resource utilization. Its load-balancing service, PRO, considers only overutilization on individual hosts.

You should also note some important conditions with SCVMM. Neither Hyper-V nor SCVMM alone can automatically relocate VMs based on performance conditions. SCVMM can relocate virtual machines only after it has been integrated with System Center Operations Manager (SCOM) and once PRO is enabled. That's because SCVMM requires SCOM to support VM monitoring.

In SCVMM 2008 R2, if host resources are overloaded, virtual machines can be live-migrated off a cluster host. According to a Microsoft TechNet article, SCVMM recognizes that a host is overloaded when memory utilization is greater than "physical memory on the host minus the host reserve value for memory on the host." It also recognizes when CPU utilization is greater than "100% minus the host reserve for CPU on the host."

Neither server load-balancing calculation aggregates metrics throughout the cluster to determine resource balance. But SCVMM uses a per-host rating system that determines where to live-migrate VMs once a host is overloaded. The system uses four resources in its algorithm: CPU, memory, disk I/O capacity and network capacity. You can prioritize these resources with a slider in the SCVMM console.

There's also an alternative solution for server load balancing: a PowerShell script that analyzes cluster conditions. Running the script balances virtual machines across a cluster by comparing the memory properties of hosts and VMs in the cluster.

Load-balancing services use numerous calculations to determine whether clustered VMs are balanced. But if you don't understand how your service computes these metrics, server load balancing is tricky. Even if you're not a math whiz, these metrics help prevent load-balancing problems.

More on load-balancing services and VM load balancing

  • VMware DRS and vMotion: Improve workload balance
  • HA and DRS clustering improvements in vSphere 4.1
  • Using benchmarks and tools for virtual workload balancing
  • Upgrading to SCVMM 2008 R2
  • Capacity planning for virtualized Microsoft environments

About the expert
Greg ShieldsGreg Shields is an independent author, instructor, Microsoft MVP and IT consultant based in Denver. He is a co-founder of Concentrated Technology LLC and has nearly 15 years of experience in IT architecture and enterprise administration. Shields specializes in Microsoft administration, systems management and monitoring, and virtualization. He is the author of several books, including Windows Server 2008: What's New/What's Changed, available from Sapien Press.

Dig Deeper on Virtual machine performance management