Server virtualization brings far better system utilization, workload flexibility and other benefits to the data center. Today, most organizations have adopted virtualization to some extent. But in spite of its benefits, virtualization is not perfect -- the hypervisors themselves are sound, but the practices and policies that arise from virtualization can waste resources and drive administrators to the breaking point. Let’s look at the top five top problems with virtualization and consider tactics to address them.
Virtual machine sprawl wastes valuable computing resources
It's common for an organization to go through all the time and trouble to virtualize 100 workloads onto a handful of servers and, six months later, have to buy more servers to operate 200 workloads that are actually running in the environment. What's going on?
Before virtualization, deploying a new server took weeks (if not months) to plan budget for systems, deployment and so on -- bringing a new workload online was a big deal that IT pros and managers scrutinized. With virtualization, a hypervisor can allocate computing resources and spin up a new virtual machine (VM) on an available server in less than 15 minutes. The problem is that companies usually don't have the policies in place to plan or manage those "quick and easy" VMs. And once VMs are in the environment, there are rarely any processes in place to tell if those VMs are still used or needed. The result is that VMs simply accumulate over time and suck up computing resources as well as backup and disaster recovery resources. Many organizations have little idea why they're running out of computing power.
VMs are easy to start and easy to delete. Businesses need policies and procedures to track VMs using lifecycle management tools. Understand why a new VM is needed -- justify it as if it were a new server. Understand how long it will be needed. There should be clear review dates and removal dates so that you can either extend or retire the VM. All of this helps to tie VMs to departments or other stakeholders so that you can see how much of the environment that part of the business is demanding. Some businesses even use chargeback tactics to bill departments for the amount of computing that they use.
VMs can congest network traffic
Network congestion is another common problem. After running all the numbers you see you have enough memory and CPU cores to stuff 25 VMs on a single server, but then you find out that the server's only network interface card (NIC) port is always saturated, causing some VMs to report network errors. Some of the VMs just can't communicate.
Before virtualization, a single application on a single server would typically use only a fraction of the server's network bandwidth. But as multiple VMs take up residence on the virtualized server, each VM on the server will demand some of the available network bandwidth. Most servers are only fitted with a single NIC port, and it doesn't take long for network traffic on a virtualized server to overwhelm the NIC. Workloads sensitive to network latency may report errors or even crash.
Standard gigabit Ethernet ports can typically support traffic from several VMs, but IT professionals planning high levels of consolidation may need to upgrade servers with multiple NIC ports to provide adequate network connectivity. You can sometimes relieve short-term traffic congestion problems by rebalancing workloads to spread out bandwidth-hungry VMs across multiple servers.
Remember that NIC upgrades may also demand additional switch ports or switch upgrades. In some cases, the traffic from multiple NICs may need to be distributed across multiple switches to prevent switch backplane saturation. This will require the attention of a network architect who should be involved in the virtualization and consolidation effort from the earliest planning phase.
Consolidation will multiply the impact of server hardware failures
Consider 10 VMs all running on the same physical server. Virtualization provides tools like snapshots and live migration that can effectively protect VMs and ensure their continued operation under normal conditions. But virtualization does nothing to protect the underlying hardware. So what happens when the server itself fails?
The single physical hardware platform becomes a single point of failure -- all of the workloads running on the platform will be affected. Greater levels of consolidation mean more workloads on each server, and more workloads will be impacted by server failures. This is very different than traditional physical deployments, in which a single server supported one application.
In a properly architected and deployed environment, the affected workloads will fail over and restart on other servers, but there will be some disruption to the workloads' availability during the restart. Remember that the workload must restart from a snapshot in storage and move from disk to memory on an available server. The process may take several minutes, depending on the size of the image and the amount of traffic on the network (an already congested network may take much longer to move the snapshot into another server's memory).
There are several tactics for mitigating server hardware failures. In the short-term, IT administrators can opt to redistribute workloads to prevent multiple critical applications from residing on a single server. It might also be possible to lower consolidation levels in the short-term to limit the number of workloads on each physical system.
Over the long-term, deploy high-availability servers for important consolidation platforms. These servers may include redundant power supplies and numerous memory protection technologies like memory sparing, memory mirroring and so on. These server hardware features help to prevent errors or at least prevent them from becoming fatal. In addition, the most critical workloads may reside on server clusters which keep multiple copies of each workload in synchronization. If one server fails, another node in the cluster takes over and continues operation without disruption.
Application performance can still be marginal in a VM
So you hear nothing but good things about virtualization and decide to move your 25 year old custom-written corporate database server into a VM. Then you discover that the database performs just slightly slower than molasses. Or perhaps you virtualize a modern application and see that it runs erratically or is just "slow." There are several possibilities when it comes to VM performance problems.
Let's look at older or in-house/custom-built applications first. One of the most efficient ways to code software is to use specific hardware calls. Unfortunately, any time you change the hardware or abstract the hardware from the application (e.g., virtualization), the software may not work correctly and usually needs to be re-coded. It's possible that your antique software simply isn't compatible with virtualization, and you will need to update it, switch to some other commercial software product that does the same job, or continue using the old physical system that it was running on all these years; none of these are particularly attractive options for an enterprise on a tight budget.
When you're faced with a more modern application that is just not performing well after virtualization, it may simply be that the workload is not receiving the appropriate amount of computing resources such as memory space, CPU cycles or cores and so on. You can typically run a benchmark utility and identify any resources that are over-utilized and then provision additional computing resources to provide some slack. The application's performance should improve. One example is memory. If memory is too tight, the application may rely on disk file swapping, which can really slow performance. Adding enough memory to avoid disk swapping can perk up performance substantially.
In both cases, some advance testing in a lab environment could have helped to identify troublesome applications and given you the opportunity to formulate solutions before rolling the VM out into production.
Software licensing is a slippery slope in a virtual environment
Why is it that we have no problem paying for a license when we install a critical application on a server, but seem to think it's perfectly OK to clone that server onto 1,000 VMs for free?
Software licensing was always confusing and expensive, but software vendors are quickly catching up with virtualization technology and updating their licensing rules to account for VMs, multiple CPUs and other resource provisioning loopholes that virtualization allows. The bottom line is that you cannot expect to clone VMs without buying licenses for the operating system and application running within that VM.
Always review and understand the licensing rules for any software that your organization deploys. A large enterprise may even retain a licensing compliance officer to track software licensing and offer guidance for software deployment (including virtualization). Involve these professionals if they are available.
And remember that license breaches can expose your organization to litigation and substantial penalties -- major software vendors often reserve the right to "audit" your organization and verify your licensing. Most vendors are simply more interested in getting their licensing fees than in launching litigation, especially for "first offenders." But when you consider that a single license may cost thousands of dollars, careless VM proliferation (see VM sprawl above) can cripple an organization financially.
Server virtualization has changed the face of modern corporate computing, allowing efficient use of computing resources on fewer physical systems, yet providing more ways of protecting data and ensuring availability. But virtualization is not perfect, and it creates new problems that IT professionals must understand and address to keep the data center running smoothly.