And that fear is one reason that Savannah, Ga.-based Sullivan Group recently scrapped its VMware virtualization software and adopted Citrix Systems Inc.'s XenServer, plus Marathon Technologies' everRun VM fault-tolerance software.
As a provider of human resources outsourcing services for small and medium-sized businesses, Sullivan Group has about 40 employees and only two IT administrators. For a couple of years, the firm used the free VMware Server in its production environment, and the free product had worked out. But when the company secured a large client, it decided they needed to grow their virtualization environment to accommodate them as well as promise constant uptime. Its choices were to invest in commercial VMware software or defect.Defecting from VMware
As an open source Linux shop, it was interested in open source Xen and had tested it, along with VMware. Once it identified which features (including fault tolerance) it needed, Citrix XenServer emerged as a no-brainer choice, explained Rob Jones, the IT director at Sullivan Group.
"We considered VMware, but for the cost and what we needed, a XenServer Enterprise license plus everRunVM were fine." Plus, earlier this year, "when we looked at VMware High Availability [HA], it was not at the same level as everRun VM; there wasn't an automatic failover feature to prevent downtime," Jones said. "Since we liked Xen and Linux, instead of investing in VMware we decided to go with XenServer and use Marathon [everRun VM] for fault tolerance."
In contrast to pricy proprietary fault-tolerant hardware from the likes of Stratus Technologies and Hewlett-Packard Co., with its NonStop servers, Marathon's everRun VM works by creating redundant VMs and synchronized mirroring of network, storage and data. The software monitors the target server to ensure that resources are available if and when VMs need to be moved there. This stands in contrast with VMware HA, which does not guarantee the resources on a failover machine will be available when needed. It also requires manual setup and testing, and unlike everRun VM, it times out when a failover occurs, so there could be a few minutes of downtime before VMs fail over.
VMware plans to remedy this with its own fault-tolerance software, which will ship in 2009. In the meantime, some IT administrators opt to protect workloads with application-level clustering software, such as Symantec Veritas Cluster Server (VCS) and Microsoft Cluster Server (MSCS).
Microsoft, on the other hand, doesn't have any plans to write fault-tolerant software for Hyper-V, according to Zane Adam, the senior director of virtualization product management and marketing. "We don't see this as an area of high demand right now, but we are watching this closely," he said.
With everRun VM software, there are three levels of failover. Level one, or basic failover, comes standard with XenServer 5 Enterprise and Platinum editions for failure detection and auto restart, as well as failover for some applications. The second level is component-level fault tolerance designed for business-critical applications that require little to no downtime. (The Sullivan Group, for example, uses component-level fault tolerance.) The third level of everRun VM, system-level fault tolerance, targets applications that can never go down.
An everRun VM starts at $2,000 per server and $4,500 per server when bundled with XenServer Enterprise. In comparison, VMware Infrastructure is priced at $5,720, which includes VMware HA.A wise move
Today, Sullivan Group runs 25 to 30 VMs on each of the three 64-bit physical XenServer hosts, including about 40 virtual desktops, Jones said. For its part, everRun VM is configured to protect against component failure and hasn't caused any noticeable latency on Sullivan's systems. Management happens through Citrix XenServer Center and everRun VM Availability Center, which are accessed separately but are interoperate.
After just a couple of weeks of running XenServer and an everRun VM, the combination was put to the test. The company had run Windows-based human resources software, SamWare, when a RAID group on one of their servers failed. Without fault-tolerance software, this would have been a catastrophe. But everRun did its job, said Erika Simpson, a network administrator at the Sullivan Group.
"It moved the live production servers to a secondary server without us even knowing. It did what it was supposed to do, and end users had no clue there was an issue," she said. "Once the problem was fixed, the VMs were automatically migrated back onto the primary server."