bptu - Fotolia
While you can use virtualization to achieve more with less, even with mission-critical workloads, you have to know what you are doing. One challenge to a virtual-first strategy is in the realm of management tools -- in particular, hypervisor management tools like VMware's vCenter Server. For example: Let's say you have a virtualized vCenter Server in your data center. VMware's Dynamic Resource Scheduler (DRS) is doing its job and performing live migrations of workloads to keep physical server use balanced.
Now, trouble begins. First, one server fails. Then, another server fails. VMware's High Availability (HA) feature kicks in and begins to reboot the failed VMs somewhere else in the data center. Then catastrophe hits and all of the hosts go down. Once the root cause is resolved, you will need to log into vCenter and boot up your VMs. But wait … where is it? DRS was moving it around, then HA tried to recover it, and then it simply disappeared. You have to find vCenter and its Microsoft SQL database, and then restore them so they can help restore the remaining servers.
This is not a failure of the software, or even a flaw in the virtual-first strategy. This was a flaw in the design and configuration of the environment.
You have to mitigate a few risks when you place the virtualization management tools inside the same virtual environment they are managing. First, isolate the management tools. This can mean placing them in a small cluster dedicated to management tools or leveraging features like VMware's DRS groups and DRS policies.
With DRS groups, you can create DRS affinity rules that require the vCenter Server and the Microsoft SQL Server it relies on to always be hosted on the same physical host. If one moves, both must move.
Further, you can restrict them to one or two servers in the data center. With those rules in place, the sequence of failures that led to the outage won't matter because you know that your management tools are together and on one of two servers.
Depending on how many management tools you have in place, a dedicated cluster may not be the best use of resources. However, it accomplishes the same goal by allowing you to quickly locate and recover the management tools that will then help you recover everything else.
You may have a similar situation with other combinations of applications that need to be kept together. In fact, it is likely that you have applications that need to be separated. You may run multiple Web servers to satisfy high-availability requirements. However, if all of those Web servers are virtualized and end up on the same host, then a single host failure could bring them all down at once.
Like using DRS affinity rules to keep servers together, you can also use DRS anti-affinity rules to keep servers apart. A quick rule can guarantee that multiple Active Directory servers or Web servers do not accidentally end up on the same host. Without those rules, a single server failure can cause a serious service interruption.
Too important to be physical
Some applications are highly sensitive to the hardware abstraction of virtualization. And some licensing models make application virtualization cost prohibitive. Still, few of those arguments stand up to close scrutiny anymore.
Processing power, memory and even network speeds have grown to a point where finding an application that cannot perform well in a virtualized environment is like finding a unicorn. The rumors persist, but every sighting turns out to be a horse with something strapped to its head.
When you're dealing with truly mission-critical applications or services, meaning that either the financial impact or business impact is too great to tolerate downtime, your organization needs to create an environment that's built for uptime. It needs to be where workloads can be quickly moved to avoid resource constraints or even a service outage. That environment needs to be where restoring a service is not tied to an overly specific hardware configuration, thereby avoiding a complete re-install of the OS and software. You should not have to buy depreciating assets and place them on a shelf just in case they might be required for recovery.
In a virtual world, a 15-minute hypervisor install can turn almost any computer into a host on which to restore services. That is a platform tailor-made for availability and mission-critical applications.
Is it dangerous to virtualize VMware vCenter?
Why you may want to virtualize mission-critical applications
More mission-critical applications should be virtual