Kit Wai Chan - Fotolia
VMs can fail for a number of reasons, but with the proper configuration, vSphere High Availability can automatically detect the failure and attempt a restart -- as long as your system meets five criteria.
The restart process for a VM in a high availability cluster that fails is often more complicated than it seems because the original host system is likely no longer available. The cluster's master host must find another host system that is both available and capable of running the afflicted VM. The cluster's master host must also evaluate certain parameters before it restarts a failed VM node.
The first step in the vSphere High Availability process is for the cluster's master host to determine whether the VM files are accessible. If the master can't find the necessary files, it can't restart the VM. In most cases, this requires the host to access a snapshot or VM image that is currently running on another active cluster node.
Next, the master node must determine whether other suitable host systems are available -- and whether the VM is even capable of running on those available host systems. The replacement host must be a different physical system than other nodes in the cluster. That way, you avoid running duplicate VM nodes on the same physical host, which would defeat the purpose of using vSphere High Availability. If there are no other compatible host systems available, it's impossible to restart the VM.
After the cluster's master host finds compatible host systems, the master considers any resource reservations on those systems. A system can reserve processors, memory, network interfaces and virtual flash. Before a VM can start, the potential host system must have enough unreserved resources available to meet its resource requirements. If there aren't enough resources available -- unreserved processor, memory, network interface or virtual flash capacity -- the VM won't be able to restart on that system.
Next, the cluster's master host must check for any prevailing host limits. For example, the VM won't restart on a system if that action violates the maximum number of supported vCPUs or VMs. If this process violates host limits, the master attempts to select an alternate host.
Finally, the master has to obey VM affinity or anti-affinity rules. For example, VM placement might be subject to VM affinity rules, which limit the VM to run on a certain subset of available host systems. In contrast, VM anti-affinity rules prevent VMs from starting on certain systems -- even if those systems are available and meet other criteria. If no available systems meet VM affinity or anti-affinity rules, it's impossible to restart the VM.
If conditions prevent a VM from restarting, the master triggers a log event noting that vSphere High Availability can't restart the VM. VSphere High Availability will try to restart the VM later if conditions change.
Dig Deeper on Disaster recovery, failover and high availability for virtual servers
Related Q&A from Stephen J. Bigelow
Embedded systems and hypervisors go hand in hand. By understanding both, admins can maximize system benefits such as multiple OS support and legacy ... Continue Reading
Application load balancers and API gateways both manage network traffic, but in their own ways. Learn the differences between them and how to use ... Continue Reading
Developers don't have a lot of free time. Code reuse helps dev teams focus on the most value aspects of a project, so ensure everyone knows how to ... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.