If you have 20 VMs running on a single physical server, you don't want all of them failing because of a host hardware failure. Implementing some type of HA safeguard is an absolute must, according to John Humphreys, program vice president at IDC's Enterprise Platform Group.
"With virtualization, you are putting five, six, 10 mission-critical servers on one physical machine," Humphreys said. "When you do that, you need a service that promises high availability. People will be willing to pay a premium to guarantee their VMs will be up and running at all times."
And pay they will since HA software from virtualization vendors or third-party HA vendors costs extra.
At the basic level, high-availability software moves VMs off of a crippled machine during a component failure and onto a designated failover machine to avoid serious downtime. There are different levels of HA that can be achieved through different types of software or through traditional clustering.
While HA software is simple to install and run, there are some drawback, said Rick J. Scherer, a systems administrator for the non-profit San Diego Data Processing Corp. The IT organization uses VMware Inc.'s virtualization software in production and for disaster recovery across its two data centers. Scherer implements both VMware HA and Microsoft Cluster Service for HA.
Scherer's production environment consists of 29 host servers with enough CPU, memory and I/O to support a lot of VMs. As of April 2009, Scherer was running 400 VMs on those 29 hosts -- about 14 VMs per host. He plans to add 100 more virtual machines by the year's end.
For the lowest level of HA, Scherer runs VMware HA software within ESX clusters. For example, VMware HA is used in a cluster of five ESX hosts running 150 VMs with Windows guest operating systems, Web services and other apps that don't need a high level of availability, he said.
With this HA solution, though, an administrator must monitor resources on host servers to ensure that there are enough resources to host more VMs during failover, Scherer said. "We have experienced hardware failures where not all of the VMs restarted due to a lack of resources [on the failover machine]," he said. "We try to always have a 10% to 20% buffer on our systems so we know we have enough resources."
Whether using an HA software product or clustering method, it is important to eliminate single points of failure within the virtual environment. Any failure point negates all failover efforts.
This was first published in November 2009