VMware VMotion allows users to migrate live virtual machines between servers, but it's currently limited to machines running CPUs in the same processor generation. Otherwise, a VM originally installed on one type of CPU may try to execute instructions that don't exist on another model.
That constraint may soon be a thing of the past, said representatives from both major x86 chip manufacturers, Intel Corp. and AMD Inc.
According to Tim Mueting, AMD virtualization solutions manager, the company is working with VMware Inc. to help the company take advantage of facilities within AMD's chips to mask differences between CPU generations, and therefore enable the safe live migration of VMs between servers running different AMD processors.
Mueting said he expects the functionality to be available "within a year, but maybe even sooner than that," and will work with Opteron Rev E and Rev F, AMD's forthcoming quad-core "Barcelona" and beyond.
In a similar vein, Intel in June announced a new feature for its forthcoming Xeon processor, code-named Penryn. The feature, called FlexMigration, will enable live migration between different generations of Intel processors.
Intel and AMD's work won't fix VMotion's limitations per se -- it will merely lay the foundation for VMware and others to take advantage of capabilities within the chips to manage differences in instruction sets. Nor are there any efforts underway to enable live migration between Intel and AMD processors.
"Our goal is to create flexibility, based on what [the customers] purchase today, for future generations of Intel processors," said Jake Smith, part of Intel's Advanced Server Technologies team. "We do not do cross-company enablement."
Differences between chips
Be that as it may, being able to migrate forward and backward between generations of chips, even if they are vendor-specific, will be a huge improvement over the current situation. Beyond CPU vendor (AMD versus Intel) and family (e.g., Pentium 4, AMD K7), VMotion currently can also falter depending on the chips' level of 64-bit support, whether or not they support NX/XD (the ability to mark memory pages as non-executable) – and what SSE level they are equipped with.
"Different generations of processors have different capabilities," said Bogomil Balkansky, VMware senior director of product marketing. For example, on x86 chips, different generations of chips have different SSE levels, where SSE stands for Streaming SIMD (Single Instruction, Multiple Data) Extensions. Introduced in 1999, the original SSE added 17 new instructions to the existing chip, mainly to deal with graphics. SSE-4, which Intel will ship with its Penryn chips, will add another 54 instructions.
When using VMotion to migrate a VM between different processor generations, "the behavior of the VM can become unpredictable," Balkansky said. On the off-chance that no incompatible instructions are being used, "it may be that the migration completes OK, but there aren't any good ways for us to predict that."
These limitations impact not only VMotion and its live migration peers, but also any software built on top of it, like VMware Distributed Resource Scheduler. CPU incompatibilities also prevent developers from taking a snapshot of a virtual machine and launching it on another machine.
Today, some VMware users get around these limitations by modifying a CPU's bit mask and thus hiding incompatible chip features such that VMotion can work safely between them. Performed from VirtualCenter, modifying a CPU bit mask involves identifying exactly which CPUs are installed on a system and modifying specific configuration files. But while some of these techniques are documented on VMware discussion forums and in blogs, most of the CPU bit masks are not officially supported.
Furthermore, they're not necessarily easy to implement, said Scott Lowe, a blogger and consultant with ePlus Technology Inc. in Herndon, Va., who has experimented with the process. As it stands, "you need to identify exactly what bits need to be hidden or masked," he said – it's not built into any VMware product. Right now, he said, in the absence of more tools, "you're just shooting in the dark."
What AMD and Intel can do
But AMD's Mueting said that for years now, AMD chips have included specialized model-specific registers (MSRs) like CPUID that supply information about the underlying CPU, the instructions it is running and its configuration. Now, the task at hand is to enable VMware and peers like Virtual Iron and XenSource to take advantage of those hooks for use within their live migration tools.
With information from CPUID on hand, the virtualization software will be able to constrain a virtual machine to only use those instructions that are available on the target server, Mueting said. For example, "if a VM were running on a Rev E chip, and it was migrated to Barcelona, VMware would mask that so that the VM still believed it was running on Rev E," he said. "The VM wouldn't gain any new virtualization-related processes," he said, but at least VMotion could guarantee that the migration would complete successfully.
Without going into specifics, Intel's FlexMigration will allow systems administrators to mask certain chip-level features as set-up time, said Smith. "The goal is for this capability to be enabled by sysadmins while they are configuring their virtual machine infrastructure [rather than] when you need it, because by the time you need it, it's typically too late."
A looming crisis?
The question remains: How big of a problem are CPU constraints today? Some VMware observers say that the VMware user community is fairly philosophical about VMotion's limitations.
"This is like saying that "Man, that Lexus is great, but I wish it could fly!" wrote Glenn Dekhayser, vice president of technology for Voyant Strategies Inc., a VMware and Virtual Iron reseller. "The very nature of VMotion requires the processor families to be equal, as the current run state of a server in bit form is dependent upon the CPU that runs it.... I've never had a client complain about this; this is a feature/benefit that is so compelling that you do what you need to, to make it work."
But Lowe said that most VMware shops are simply unaware of the problem – for now, and that, eventually, the chickens will come home to roost. "A lot of organizations that have embraced VMware haven't gotten to the point in their lifecycle where they've addressed this yet," Lowe said.
"VMware really stormed the market maybe two years ago in terms of the enterprise," Lowe explained. Working with their vendors or integrators as part of a server consolidation project, they installed VMware on a cluster of identical servers. However, "at some point, they're going to get the point where they want to add another platform [to their ESX server farm], and if they've been sold on the idea that they can simply use VMotion to do their migrations in a week rather than in three months, they're not going to be happy."
Let us know what you think about the story; email: Alex Barrett, News Director.