Providing high availability (HA) in a virtual datacenter is a multi-tiered task involving achieving live backups and implementing failover functionality and/or clustering, among other things. I covered virtual machine (VM) backups in the last installment of this series. Now, let's look at how to configure clusters or, at least, create failover structures in virtual environments.
HA in virtualization can take place at two levels instead of just one. We can work at guest level, relying on OS and application disaster recovery capabilities, or work at host level, facing a new kind of problems.
The process of implementing HA configurations at the guest level is almost identical to what we already do in physical environments. There are some technical issues to address, like configuring a static MAC address for each virtual network interface, and some limitations, which depend on the chosen virtualization platform and on the chosen HA software. But it's basically always possible create a virtual cluster, or even create a mixed one, where one or more nodes are virtual machines while others are physical ones.
Much more complex, but much more needed, is providing high availability for hosts. In such a scenario, considering failover for example, virtual machines running on one host have to be copied on another one and continuously synchronized, replicating virtual disks and virtual memory modifications. This operation has same problems involved in live backup, but also adds the complexity of doing everything as fast as possible and as many times as possible.
Here Vizioncore is once again a protagonist, with esxReplicator, able to copy running a VM from one VMware ESX Server to another, without or without a centralized storage facility. Unfortunately, this product doesn't handle network modifications needed to perform a real failover, so we have to manually switch between a faulty host and a cold standby one.
A more dynamic solution is provided by VMware itself, which introduced with ESX Server 3 and VirtualCenter 2 a failover option based on VMotion. Unlike Vizioncore esxReplicator, VMware HA automatically restarts the VMs of a faulty host. Unfortunately, VMware HA is much more demanding in terms of configuration. It requires VirtualCenter and VMotion and won't work if VMs are not stored in a fibre channel SAN environment.
Other HA ways
Physical-to-virtual (P2V) migration tools, on the other hand, can help us perform virtual-to-virtual migrations, so we could configure them to replicate virtual machines' contents from a host to another.
In this arena, PlateSpin is the preferred choice at the moment, offering live migration for Windows operating systems. It's also possible to use this technology for disaster recovery. Unfortunately, just like Vizioncore, PlateSpin doesn't handle every aspect of failover, so we still have to manually intervene.
Using failover is a good approach, but surely the most desirable HA configuration is clustering. In a cluster, multiple hosts act as an execution front-end for commonly shared virtual machines. If one of them goes down, there is no service interruption because virtual machines are always available through remaining hosts.
Clustering capability can be implemented at the host level as a native feature of a virtualization platform or with a third-party solution.
In Microsoft Virtual Server, for example, Windows is the host OS and Microsoft grants virtualization physical nodes clustering through its Cluster Service.
VMwareESX Server, on the other side, has no such feature, but counts on external solution like Symantec Veritas Cluster Service to achieve the task. The recent acquisition of Rainfinity by EMC Corp. gives some hope that one day RainWall technology could be used to perform ESX clustering natively.
Today, clustering solutions for virtualization are far from being considered mature, and businesses should perform severe tests before adopting any one of them.
Failover and clustering configurations are also complicated by different architectures: when virtual machines are moved from a host to another, they could be served by CPUs of different vendors, which are similar but not identical. Also, current virtualization platforms are still unable to handle these differences in real-time during a live migration.
In similar fashion, if available hosts have different hardware configurations, VMs' virtual hardware assignments – like a VM with four virtual CPUs -- may be not be satisfied, preventing migration at all.
This whole situation may get worse in the near future, depending on how vendors will implement support for paravirtualization. Consider that this approach requires new generation CPUs, able to run host operating system at a special ring level. If the virtualization platform is not able to concurrently run both usual binary translation and paravirtualization, or if it's not able to seamlessly switch between them, this will prevent using a mix of old and new physical servers. In other words, we'll be obliged to renovate the whole hardware infrastructure each time we buy new gears or to carefully decide how to aggregate hosts for high availability.
Last, but not least, we have to grant reliable access to the storage facility, which surely is the most critical step. This is something usually addressed by the so-called multipathing; when hosts have aboard two or more HBAs (host bus adaptors), configured to reach more than one SAN, the storage management software knows how to prefer a working link among faulty ones, dynamically.
Having a software feature provided at drivers' level creates some restrictions. Depending on which virtualization platform you choose, you may not have such capabilities. For instance, the current architecture of VMware ESX Server, for example, doesn't allow storage vendors to plug in their own drivers, and the provided ones don't support dynamic multipathing.
When choosing a hosted solution, like VMware Server or Microsoft Virtual Server, you are relying instead on the operating system to support OEMs' drivers, which is always granted.
I've outlined the means and roadblocks to achieving HA in virtualized environments. I welcome any comments on my analysis of the situation. You can write to me via SearchServerVirtualization.com at firstname.lastname@example.org.
About the author: Alessandro Perilli is a recognized IT security and virtualization technology analyst. He is CISSP certified and is also certified in Check Point, Cisco, Citrix, CompTIA, Microsoft, and Prosoft. In 2006 he received the Microsoft Most Valuable Professional (MVP) award for security technologies. Perilli pioneered modern virtualization evangelism, and is the founder of the well-known blog virtualization.info. Alessandro Perilli is also the founder of the False Negatives project, a high quality IT security consulting and training business in Italy.