Virtualization aficionados say that tools like VMware VMotion and High Availability are bringing new levels of...
availability to their environments, without having to resort to complicated failover clustering software like Microsoft Cluster Server or Symantec Veritas Cluster Server.
Take Sun Media Corp., a Canadian media company headquartered in Toronto. The company began using virtualization software from VMware three years ago, and since then, has virtualized 85 physical servers on to a pair of 16-way IBM x445s running ESX. At the same time, they've all but stopped using failover clustering software, turning to VMotion instead.
"We used to do a lot of clustering," said Timothy Happychuk, regional IT director at Sun Media's Winnipeg data center. "It was all we knew how to do in the physical world."
When VMware came along, Sun Media kept on clustering. This time, instead of clustering entire physical machines they'd create clusters of virtual machines (VMs), but that didn't last long.
"[VMotion] works so well that clustering became an exercise in futility … almost pointless," Happychuk said. "It's so much more efficient."
Assuming that a virtual machine (VM) file (e.g., the VMware .vmdk) resides on shared network storage, VMotion migrates a running VM by remounting the file from one physical ESX host to another. As such, it can be used to migrate VMs before doing maintenance on a host. Coupled with another VMware tool -- Distributed Resource Scheduler -- VMotion can automatically migrate a VM if it isn't getting sufficient resources, or if another VM on that host requires more performance.
In theory, failover clustering goes one step further than live migration and enables you to withstand "unplanned downtime" like an application crash or hardware failure. Failover clustering software usually assumes two physical machines, assigns one as a primary and the other as a failover target. Clustering software installed on the machines emit a "heartbeat" which, if it goes undetected, is a signal to the standby machine to launch the application.
Equipped with extensions for applications like a database or email server, failover clustering can restart a service not only if the operating system goes down, but if the application fails.
Windows failover a failure?
But a lot of Windows administrators have had less than successful experiences with clustering. Eddy Caballero, systems analyst with Greenberg Traurig, a law firm headquartered in Miami, Fla., said his firm used to use clustering software but "there's just no need for it under VI3."
Clusters, Caballero said, "work great if you leave them alone" but sometimes administrators will forget that a certain server is in a cluster, install an application, and it fails. "It's supposed to work, but does it? I don't know," he said.
That sentiment doesn't surprise Gordon Haff, principal IT analyst at Illuminata, a research firm in Nashua, NH, especially when it comes to Windows applications where virtualization has taken hold.
"Clustering has always been popular in certain areas like high-end Unix systems, but it never really took in the volume server market by storm because it's pretty complex," Haff said. In fact, "VMotion is probably already used more widely than clustering ever was in the volume systems space."
The VMotion message seems to be taking hold among users. Speaking at the IDC Virtualization Forum in New York City last week, John Humphreys, IDC program director for enterprise virtualization, said that in 2006, 40% users listed "live migration" as a primary reason to virtualize, up from 25% just a year before.
"Virtualization 2.0 is all about mobility," Humphreys said, and predicted that by 2010, the majority of virtualization spending will be on virtualization-based disaster recovery and high availability technologies – "all predicated on the ability to move from one host to another."
For now, the only other vendor besides VMware to offer live migration capabilities is Virtual Iron, which offers LiveMigrate. XenSource promises this capability in the spring of this year, and Microsoft plans to include it in Windows Server Longhorn virtualization, due out by mid-2008.
Another reason not to bother
Now, with the availability of VMware HA, the reasons to cluster seem to be getting fewer and farther between.
The high availability software for VMware ESX monitors the ESX host for signs of failure and automatically restarts any failed VMs on an alternate host.
Greenberg Traurig's Caballero recently saw VMware HA in action. An ESX host had seized after having a new network interface card (NIC) installed, but Caballero only found out because Systems Insight Manager -- the monitoring software that managed the Hewlett-Packard server -- alerted him to the hardware failure.
Microsoft Operations Manager (MOM), which was in charge of monitoring the actual VMs, never noticed the failure because by the time it had sent out a probe, VMware HA had already restarted the VMs from the failed host on a different machine.
"MOM didn't have time to see the failure," Caballero said.
Where clustering lives on
But industry experts and clustering vendors dispute the idea that clustering is becoming irrelevant in virtual environments.
"A lot of people think that the days of clustering are over because of virtualization, but it turns out that it's a bit more nuanced than that," said Jean Bozman, vice president of global enterprise server solutions for IDC in Framingham, Mass. "Virtualization solves some of the problems, but not all of them."
In fact, virtualization can actually increase your chances and the gravity of downtime – increasing the need for clustering, said Bob Williamson, vice president of product marketing and management at the Palo Alto, Calif.-based Steeleye Technology Inc., maker of the LifeKeeper clustering suite for Windows and Linux.
"Some people think that virtualization can achieve the same thing as HA clustering, but that's wrong," Williamson said.
For one thing, adding a virtualization layer – the hypervisor -- adds a layer of complexity to the environment. "The most common cause of downtime comes from the software stack, usually related to I/O drivers," he said.
For another, consolidating multiple servers on a virtualization hosts creates the "all-your-eggs-in-one-basket problem," Williamson said. "If you experience a hardware failure, more [virtual] servers are at risk of being down."
And while Williamson conceded that the combination of VMotion and VMware HA successfully protect against planned and hardware failures, they have their limits: They do nothing to monitor the health of the applications and services running within the VMs, nor do they allow users to restore to a non-virtual host.
Steeleye's LifeKeeper, like Symantec's VCS, can be equipped with optional agents for key enterprise applications such as databases, Exchange, SAP, and the like. LifeKeeper also comes with a software development kit that will let you cluster-enable a homegrown application.
Furthermore, most clustering software can be paired with long-distance data replication software to protect against site downtime.
And going forward, Williamson said that Steeleye plans to deliver a product that clusters VMware's own VirtualCenter management application, and will allow users to install LifeKeeper within the ESX console rather than within individual machines. Symantec VCS for ESX already supports a similar configuration.
Symantec launched VCS for ESX in January, and since then, has discovered numerous benefits to running VCS within the ESX console, said Jason Nadeau, Symantec senior product manager for VCS, namely, reducing the number of instances of VCS you need to install. Also, since VCS runs "underneath" the guest OS, it relieves the application owner from having to know about clustering.
"The app admins can be blissfully unaware that hardware clustering is going on," Nadeau said.
Let us know what you think about the story; email: Alex Barrett, News Director