In theory, long-distance live migration sounds like a great idea. In reality, it’s fraught with limitations, complexity...
and high costs -- at least for now.
Within the confines of the data center, live migration technologies such as VMware vMotion are well-established tools for transferring virtual machines (VMs) from one physical server to another without downtime. Attempting to perform a vMotion between servers in geographically separate data centers, on the other hand, brings up latency issues and requires the use of replicated storage and Layer 2 stretched clusters. That’s a lot to ask of most organizations, said Joe Skorupa, Gartner research vice president for communications and networking.
“People want to be able to do it, but the storage replication component has long been the challenge,” he said.
Long-distance vMotion, take two
F5 Networks is proposing a new solution to the long-distance vMotion problem that eliminates several of its current requirements. Working with VMware and storage vendor NetApp, F5 proposed a method that combines its BIG-IP load balancing and wide area network (WAN) optimization products with NetApp FlexCache, a storage acceleration appliance.
F5’s approach aims to eliminate some of the roadblocks to performing long-distance live migration. VMware’s shared storage requirement for vMotion in vSphere 5, for example, assumes the presence of some form of active/active replicated storage, such as EMC VPLEX. And for a migrated VM to maintain its network connectivity post-move, it must be able to maintain its IP address -- predicating some form of a Layer 2 stretched cluster, attending networking infrastructure (Cisco OTV) and a fat WAN pipe.
But under the F5 scheme, when it’s time to initiate a long-distance vMotion, NetApp FlexCache begins migrating storage to the remote location, speeding up data traffic over the WAN using its BIG-IP Local Traffic Manager. Once FlexCache is done migrating the storage, it instructs vSphere to execute the vMotion event, and BIG-IP Global Traffic Manager begins redirecting traffic to the fully migrated remote VM.
The combination delivers long-distance live migration without requiring active/active storage or a Layer 2 stretched cluster, said Phil de la Motte, an F5 business development manager.
F5 first showed off its long-distance live migration capability in 2009, using VMware Storage vMotion to move data between sites. At VMworld 2011, the company demonstrated live migration using EMC’s VPLEX Metro, whose globally coherent active/active storage platform removes the need to migrate storage data across sites.
The time difference it takes to do a long-distance live migration using Storage vMotion and FlexCache is stark, de la Motte said.
“Starting from scratch, it could take days using Storage vMotion to move the data to the other side,” he said.
FlexCache, meanwhile, can prepopulate the remote array in the background, cutting back dramatically on the amount of data that needs to be transferred at the time of a vMotion, he added.
Desperately seeking disaster avoidance
F5 doesn’t have any customers for its long-distance live migration technology yet, but customers have expressed a lot of interest, mainly for disaster avoidance use cases, de la Motte said.
“The flood waters are rising,” he said. “A hurricane is coming. It’d be great to move your VMs over to a disaster recovery site in advance without having to incur downtime.”
Other potential use cases include data center migrations, consolidation projects in the wake of mergers and acquisitions, and even capacity planning.
But there are still a lot of obstacles to long-distance live migration, not the least of which is physics -- specifically, the speed of light. VSphere 5’s 10-millisecond latency maximum limits long-distance live migration to about 250 miles, which is “just about enough distance to get out of a disaster zone” but not enough to provide comprehensive disaster recovery, de la Motte said.
A bigger obstacle may be IT perception.
“People have to get comfortable with the idea of running multiple data centers as a single combined entity,” de la Motte said. “Running apps between multiple data centers has big benefits, but companies have to feel comfortable doing this.”
Other WAN optimization vendors, such as Riverbed Technology and Citrix Systems, will probably follow with their own long-distance live migration technologies, Gartner’s Skorupa said. But F5, which has 55% of the application delivery controller marketplace and a strong presence in medium-sized and large enterprises, “has gotten way out in front” by partnering with VMware and NetApp, Skorupa said.
Long-distance live migration is still “a non-trivial thing to do,” he said, but “it’s not really a science project anymore.”