ag visuell - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Prevent host failure with VMware DRS, HA and FT

Distributed Resource Scheduler, High Availability and Fault Tolerance are vSphere tools designed to ensure high performance and continuous operability in different ways.

VMware Distributed Resource Scheduler, High Availability and Fault Tolerance are all important features that improve workload availability and resilience. Each of these features plays a part in ensuring hosts and the VMs that reside on them function at a high performance level, even in the event of a host failure. Although these features can be used in conjunction to safeguard your environment, each of these three features performs a unique task, and users should take care not to confuse them.

HA requires some downtime to restart the failed VM on a new server, whereas FT has the secondary VM available at a moment's notice, maintaining consistent uptime.

Let's start by taking a look at VMware Distributed Resource Scheduler (DRS). In a nutshell, DRS is a tool that uses vMotion to automatically balance and optimize workloads according to available resources across multiple hosts. DRS promotes proper resource allocation by carefully monitoring cluster resources. In the event of resource contention, DRS migrates VMs to another host in the cluster. DRS can also power down unnecessary physical servers, improving efficiency and performance.

When paired with VMware High Availability (HA), DRS can act as a first line of defense against host failure. HA is a utility that pools VMs and the hosts into a cluster. This allows HA to closely monitor the hosts on the cluster and more easily detect host failure. In the event of a failure, HA restarts the afflicted VMs on a different server. By using DRS and HA in tandem, you combine the capability of automatic failover with load balancing, allowing for faster rebalancing of workloads on the new server.

Fault Tolerance (FT) is another utility that protects against server failure. Although FT improves workload availability, it isn't the same thing as HA. For one, FT takes a completely different approach to ensuring workload resilience. Unlike HA, FT creates a duplicate of the primary VM. This secondary VM, also known as a shadow copy, waits in the wings until a host failure occurs, at which point it replaces the primary VM. This creates an important distinction between HA and FT in terms of downtime: HA requires some downtime to restart the failed VM on a new server, whereas FT has the secondary VM available at a moment's notice, maintaining consistent uptime. It's for this reason that HA is better-suited to non-mission critical VMs and FT to mission critical VMs. Administrators seeking an added layer of protection for their VMs can apply FT to an HA cluster. Despite its ability to maintain uptime, FT has had relatively low adoption rates due to a single vCPU limitation and latency issues between the primary and secondary VMs. Fortunately, VMware has made significant changes to FT in vSphere versions 6 and 6.5 to address these issues, and FT now offers support for multiple vCPUs and reduced latency.

Next Steps

VCenter Server 6.5 reduces HA downtime

Distinguishing High Availability from high availability

How well do you know VMware DRS?

Dig Deeper on VMware management tools

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Which feature do you find most useful for improving workload availability: DRS, HA or FT?
I'd also add the option of vRealize Operations Manager and predictive DRS.  Being able to predict a host failure, allow DRS to move workloads before a host fails before HA happens is a nice feature.  Using FT is also great but with vSphere 5.5, only supported on 1 vCPU, vSphere 6.x supports up to 4 vCPUs, so if you have workloads requiring more than 4 vCPUs, FT won't be applicable
What is the main difference between HA and DRS???
Let's just say that we have enabled HA and DRS. And we have a host failure.
I would like to know  DRS would be taking action or HA. 
HA will first kick in by restarting the doomed vms on a new host in the cluster, using vMotion. After which, DRS will intelligently load balance the vms is needed.