VMware vSphere's High Availability service can automatically restart VMs that have stopped responding, but you...
must use vSphere HA application monitoring to get the full details.
The principal mechanism for responsiveness to VMs is a heartbeat, which is an artificially generated signal VMware Tools produces and the vSphere VM Monitoring service receives. If the heartbeat is absent for longer than a prescribed time, vSphere deems the VM non-responsive.
This usually indicates that there's a fault in the guest OS or that VMware Tools isn't functioning, which is often because it's not getting compute time. In any case, vSphere High Availability (HA) can trigger a restart of the afflicted VM.
A VM heartbeat without vSphere HA application monitoring or VM monitoring results isn't a perfect indication of a VM's condition or functionality. There are cases where the VM heartbeat might stop, but the VM and its application continues to function normally. If this happens, vSphere HA might restart the VM unnecessarily.
To improve VM monitoring and prevent unnecessary VM restarts, the VM Monitoring service in vSphere HA can also check the VM's I/O to determine disk or network activity -- a fundamental indication of application activity.
VM Monitoring checks for I/O activity for the previous two minutes, in addition to the regular heartbeat. If the VM heartbeat is missing, but there is recent I/O activity, the VM workload might still be working, so vSphere won't restart the VM. If the VM heartbeat is missing and there's no recent I/O activity -- within two minutes by default -- then the cluster's master node can restart the afflicted VM.
Beyond VM heartbeats and I/O activity, VMware also supports vSphere HA application monitoring, which enables you to configure customized heartbeats for select applications. This requires applications that support vSphere HA application monitoring or an SDK that you can integrate with the application.
VSphere HA application monitoring works almost exactly like the VM Monitoring service. Once you enable vSphere HA application monitoring and the application is producing a custom heartbeat, it restarts a VM if the application's heartbeats stop for a specified period of time.
You can also select sensitivity with vSphere HA application monitoring. High sensitivity looks for heartbeats that are absent for over 30 seconds, medium sensitivity checks for heartbeats that are absent for over one minute, and low sensitivity checks for heartbeats that are absent for over two minutes.
You can also configure custom monitoring periods. Shorter windows can detect troubled VMs faster, which can lead to earlier VM restarts, though this increases the possibility of false positives.
Dig Deeper on Disaster recovery, failover and high availability for virtual servers
Related Q&A from Stephen J. Bigelow
Azure Update Management works with other Microsoft administrative tools to give IT pros a more complete offering to patch operating systems. Continue Reading
Azure Update Management supports a large number of Windows and Linux systems on premises and in the cloud, but there are certain requirements to meet... Continue Reading
Microsoft built Azure Update Management for administrators who require a centralized tool to automate patches for systems both on premises and in the... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.