The ability to quickly move virtual machines off a node can be vital to addressing an impending host issue or to streamline the process of evacuating VMs for patching and maintenance. Hyper-V 2012 Cluster Live Evacuations gives admins a way to avoid live-migrating VMs individually from one host to another without having to use System Center Virtual Machine Manager. It is also available with the free Hyper-V Server platform.
Clustering Hyper-V nodes is a great way to give workloads high availability. In the event that one node crashes, the cluster can recover the VM on another node and reboot it. But what if the cluster node is still running but has an impending hardware problem, like a bad memory module or a disk controller issue? In this case, you should evacuate -- or move VMs off -- the host as fast as possible to avoid downtime.
System Center Virtual Machine Manager can help by putting a Hyper-V 2012 cluster in Maintenance Mode, but in many cases if a host is having problems, SCVMM is useless for real-time functions. Smaller organizations also might not have the money to purchase a System Center Suite but still might want a way to quickly move VMs off a node. In these cases, admins can turn to the built-in Failover Cluster Manager or PowerShell.
There are several ways to move VMs from one node to another with these tools -- and some common pitfalls if hosts are not correctly architected.
Pausing and draining nodes
Live evacuation of VMs -- also known as "pausing and draining" the VM workloads -- from a cluster node is the process of moving all resources off the original node onto one or more remaining cluster nodes. This process can be used for any clustered resource, but in the case of VMs, it uses Live Migration to move the VMs from one node to another with no downtime. Previous versions of Hyper-V required one-by-one migrations. In Windows Server 2012 and Windows Server 2012 R2, you can move all VMs off a node with just a few clicks.
There are several steps needed to pause a Hyper-V 2012 cluster node and move all VMs off it to the remaining node:
- Open Failover Cluster Manager.
- Connect to your cluster name if it does not do this automatically.
- Right-click on the node from which you want to Live Evacuate the VMs, and choose Pause, then Drain Roles.
Once started, your VMs will Live Migrate to the alternate node(s) at the rate you determined when you set up Hyper-V for Simultaneous Live Migrations within Hyper-V Manager under the Hyper-V properties section.
Be careful of how aggressively you set your simultaneous Live Migration setting. Trying to Live Migrate too many VMs can congest network bandwidth, and the migration actually will take longer than if you had used smaller batches. Windows Server 2012 R2 adds compression and Server Message Block Protocol transport methods for Live Migration speed enhancements, so expect to see your migration times cut significantly.
If you have been working with cluster graphical user interface (GUI) consoles for any amount of time, you know the console can become unresponsive. This is where knowing alternative scripted methods can sometimes be more effective. In Windows Server 2012 and 2012 R2, the FailoverClusters cmdlet is loaded automatically if you have the Failover Cluster Manager tools:
Suspend-ClusterNode –Name <NodeName> -Cluster <ClusterName> -Drain
Resuming the node
You should resume the paused node as soon as you resolve the issue. Be aware that if there aren't enough resources on remaining nodes, you could experience a significant VM downtime if you lose another node in the cluster. Getting the paused node resumed should be a priority after a problem is addressed or maintenance is completed.
Now that you have evacuated the VMs from the node to fix a problem or reboot, you will need to resume the paused node. To do this, you need to do the following:
- Open Failover Cluster Manager.
- Right-click on the paused node.
- Go down to Resume.
- Choose either "Do Not Fail Roles Back" or "Fail Roles Back."
- If you choose "Do Not Fail Roles Back," you can now manually migrate your VMs over to the node you just resumed to balance resources.
Choosing to Fail Roles Back is also an option if you need to move exactly the same VMs back to the newly refreshed node. But in most cases, admins are patching several hosts and will choose the Do Not Fail Roles Back option -- allowing them to migrate VMs from the next host onto the patched one. You can choose to do either; but in my experience, unless you have a reason to keep the VMs on the original node, you should choose Do Not Fail Roles Back.
Hopefully the misbehaving cluster node is corrected and the Failover Cluster GUI is now responsive. If it isn't, or if you prefer a scripted method, you can try the following PowerShell script:
Resume-ClusterNode –name <NodeName> -Cluster <ClusterName>
Using the pause and drain options from within the Failover Cluster Manager can ease migrations for smaller organizations that may not be able to justify purchasing System Center Virtual Machine Manager. Test it out, and add it as another option when you need to free up a Hyper-V 2012 cluster node quickly. For a video of this process, check out my blog at VirtuallyAware.com.