VMware defends its upcoming fault-tolerance feature
During VMworld 2008 in Las Vegas last week, VMware Inc. announced its upcoming fault tolerance feature and gave a demonstration of it during one of the keynote sessions. It looked pretty good and simple to use, but Littleton, Mass.-based Marathon Technologies Corp., a company that specializes in fault tolerance software, had plenty to say otherwise.
In response to Marathon’s blog dissin’ the upcoming feature, Palo Alto, Calif.-based VMware‘s Mike DePetrillo, a principal systems engineer, wrote a blog defending VMware Fault Tolerance.
For starters, Marathon complained that VMware does not provide component-level fault tolerance. “The most common failures that result in unplanned downtime are component failures such as storage, NIC [network interface card] or controller failures. Yet VMware Fault Tolerance doesn’t do anything to protect against I/O, storage or network failures.”
DePetrillo noted that VMware already has features to protect again component failure. “If your NIC fails you’ve got NIC teaming built into the system. To set it up simply plug in both NICs to the server, go into the network panel and attach both of them to the same virtual switch. Done. Four clicks. Same thing for storage with the built-in SAN [storage area network] multipathing drivers,” DePetrillo wrote. “I absolutely agree with the author that component failures are the cause of most crashes and that’s why VMware added these features in 2002. VMware FT is not designed for component failure because there’s no sense in moving the VM to another host if you’ve simply lost a NIC uplink. NIC teaming will take care of that with ease and is a LOT cheaper than using CPU and memory resources on another host to overcome the failure.”
Marathon’s second beef: VMware’s fault tolerance is too complex. “In order to use VMware Fault Tolerance, you’ll first have to install both VMware HA [High Availability] and DRS [Distributed Resource Scheduler]. No small feat in and of themselves. Then, because VMware FT requires NIC teaming, you’ll also have to manually install paired NICs. Then you’ll need to manually set up dual storage controllers (with the software to manage them) because it requires multipathing. And to top it all off, you’re required to use an expensive, and often complicated, SAN.”
DePetrillo said the process requires checking off two boxes – HA and DRS. That’s it. “If that’s too hard then please comment and let me know how it could possibly be easier. Even my dog has figured out how to do this now. Granted, it’s a pretty smart dog.”
“As for setting up the dual NICs and dual HBAs [host bus adapters], well, yes, you have to actually plug the physical devices in. After you’ve done that the **built-in** NIC teaming and HBA drivers will take over and configure most everything for you. The NIC teaming does require four extra clicks. The HBA drivers actually figure out the failover paths, match them up, and set up the appropriate form of failover all auto-magically. They’ve been doing this since ESX 1.5 (6 years ago),” DePetrillo blogged.
“Lastly, yes, this requires shared storage. Pretty sure that most environments that want FT (no downtime what-so-ever because our business could lose millions) already have a SAN to take advantage of other things virtualization related such as DRS and VMotion,” he wrote.
Also, VMware FT does not require dual NICs or dual HBAs because, DePetrillo said, “This is something you should have in every virtualization setup that’s running VMs you care anything about, but it’s not a requirement to get VMware FT [Fault Tolerance] running.”
The last point Marathon makes that’s worth spending any time on is that VMware offers onlylimited CPU fault tolerance. “With VMware FT, you’ll need to set up what VMware refers to as a “record/replay” capability on both a primary and secondary server. If something happens to the primary server, the record is stored on the SAN and then restarted on the secondary server. … The whole thing depends on the quality of the SAN. Second, in the words of the VMware engineer who presented at VMworld, “this can take a couple of seconds.” So what happens to your application state in those couple of seconds?”
DePetrillo’s defense is that “if you’re the type of company that requires absolutely no downtime for an app — if the app is just that critical — then I’m pretty sure you’re going to have a decent SAN. … If you’re having so many problems with your SAN that you don’t trust it for FT, then you have much bigger issues at hand that VMware or Marathon or any of the other virtualization related vendors aren’t going to help you with.”
You can read more of VMware’s comments on DePetrillo’s blog, which gets into some details on how VMware Fault Tolerance will work, and vice versa for Marathon.
But I think it is obvious that Marathon is making VMware’s fault tolerance feature seem worse than it is, and VMware is making its new feature seem simpler than it is.
For the most part, this is a pissing contest between the incumbent fault-tolerance vendor and the “new guy,” but the fact of the matter is, if you use VMware virtualization, you can’t use Marathon Technologies because they don’t support VMware (obviously) and if you use Citrix Systems’ XenServer, you can’t use VMware Fault Tolerance, so these arguments are moot.
Join the conversation
4 comments