Server virtualization provides countless benefits over the use of physical servers, but it has at least one significant...
Hardware failure in a virtualized environment has more severe consequences than in a physical server environment. If a server fails in a physical environment, then it generally only affects a single workload. However, if a virtualization host fails, all VMs running on the host also fail. And that could result in a major outage.
Protect against a virtualization host failure
Using a failover clustering to achieve high availability for VMs can fix this problem. As its name implies, in the event of a host server failure, failover clustering allows VMs that run on a failed host to failover to another host in the cluster, where they can continue running.
Because of the potential consequences of a host server failure, the use of failover clustering has become the standard and accepted way of operating a virtualized environment. But failover clustering can be expensive to implement, as it includes costs for licensing the virtualization software as well as hardware costs for servers and shared storage arrays. And that doesn't even account for the cost to support and maintain the failover cluster. To avoid the cost and complexity associated with building and maintaining a clustered hypervisor environment, it's worth asking if it's acceptable to use a stand-alone, nonclustered virtualization host instead.
It's fairly common for organizations to use nonclustered virtualization hosts in lab environments because IT departments often have limited budgets for dev/test labs. Besides, there probably aren't going to be dire consequences if a failure occurs in a lab environment. This isn't the case in production environments.
On the surface, using a stand-alone virtualization host in a production environment seems like a clear violation of long-established best practices, but it's acceptable in the following situations.
When to use a stand-alone host
The most common example of using a stand-alone host in a production environment happens in very small businesses. Small companies that have only a few employees often have a single virtualization host running a handful of VMs. Even though this practice is somewhat common, it's still risky and ill-advised because a host server failure can result in a total outage. For organizations such as this that might lack the budget or the expertise to deploy a failover cluster, it's better to run production workloads in the public cloud, rather than rely on a stand-alone virtualization host.
Organizations that have some form of redundancy in place can also use a stand-alone host. For example, Hyper-V has can replicate VMs to a secondary or tertiary host -- without building a failover cluster. This means it's easier and less expensive than building a failover cluster.
However, hypervisor-level replication doesn't provide real-time failover capabilities. In the case of Hyper-V, it's possible to fail over to a replica VM. But doing so is a manual process that results in a brief amount of downtime. Still, the enterprise could handle a brief outage for some workloads.
Finally, it's also acceptable to use a stand-alone virtualization host when redundancy exists at the VM level. Consider, for instance, an environment in which three domain controllers exist on three separate virtualization hosts. The lack of failover clustering doesn't present an undue risk in this case, because the domain controllers are redundant. This is also true with guest clusters; if workloads are clustered at the guest level, then host-level clustering provides an extra safety net. But this isn't the only defense against an outage.
It's always best to join virtualization hosts to a failover cluster whenever possible. In the real world, technical or budgetary constraints could prevent the use of failover clustering. In these situations, look for some other way to protect workloads in the event of a host outage. IT teams can configure the free version of Hyper-V, for example, as a failover cluster. VM replication and guest clustering are two other options. The important thing: Don't allow a host server to become a single point of failure.
Navigate a host failure with these steps
Determine the proper level of redundancy