Virtualization errors happen to everyone

Even top experts make mistakes when it comes to installing and managing server virtualization. As Halloween approaches, our Advisory Board members share their horror stories.

Virtualization errors are a fact of life for systems administrators and consultants. No matter how well you prepare for a deployment or how carefully you manage your environment, unexpected problems are bound to pop up.

Virtualization errors can result in server downtime and, in extreme cases, data loss. They can also keep you from expanding your deployment and realizing the advanced benefits of server virtualization.

Members of our Server Virtualization Advisory Board talk about some of the virtualization errors they've made as they answer this question:

What's the biggest mistake you've ever made while deploying or managing a virtual environment, and how did you fix it?

Jack Kaiser, GreenPages Technology Solutions

I asked our director of solution engineering, Brian Gagnon, to respond to this one. He leads our Virtualization Implementation Practice.

The majority of times we've seen customers make mistakes virtualizing the data center, it is typically around because they use physical environment policies, procedures and design instead of adapting to the new features and abilities that virtualization brings to the table. We've seen really two types of customers that stall their virtualization efforts: those that move too fast without a solid plan to scale, manage and operate; and those that get caught in design paralysis -- when "business as usual" becomes a barrier to virtualization adoption.

Both of these scenarios lead to a lower percentage of virtualization throughout the customer's data center. I call this "Too Much, Too Soon/Too Little, Too Late Syndrome." By pushing virtualization into production without a solid operational, scalable, flexible architecture, customers tend to hit a point where they actually "un-virtualize" applications. Full virtualization can be realized only if the environment is designed to meet the needs of the business, which in some cases moves beyond straight consolidation ratios.

Rob McShinsky, Dartmouth Hitchcock Medical Center

My biggest mistake happened when trying to rebuild three of six Hyper-V virtual host cluster nodes. The eviction, rebuild and addition of each of the nodes back into the cluster went flawlessly, but in my haste to get the systems back into the cluster, I failed to remove them from the domain policy, which automatically installs monthly Microsoft patches.

If had forgotten only one of the nodes, it would have been fine, since the VMs would have migrated over to the other five nodes. But with three of the six nodes rebooting and trying send their VMs to other nodes, they had nowhere to go, and things got a little messy. Luckily there was no data loss, but about half of the 100 VMs in the cluster experienced an unexpected shutdown. What was learned? Do not rush when configuring your hosts. The stakes are high, and a small mistake can haunt you.

Shannon Snowden, New Age Technologies

One of the biggest mistakes I commonly see as a consultant is the painfully obvious lack of knowledge prior to a virtualization project deployment.

Two reasons usually surface:

  1. Lack of understanding of what they bought: Companies often spend on the technology without investing in preparing their people. Hypervisors are erroneously categorized as just another operating system, and either the Windows or the UNIX team gets ownership.
  2. Arrogance or fear among the IT staff: Sometimes lack of training isn't the company's fault. I've actually seen IT staff turn down training or consulting when offered by management. Maybe it's arrogance or fear that drives them to not want to admit to their bosses they need training.

I try to prepare both management and IT staff to appreciate that virtualization introduces a significant opportunity to transform how the data center is managed. That means new skills, new teams and new methodologies for the organization.

Rick Vanover, Alliance Data

My biggest mistake was working on a neighboring server and inadvertently knocking power out to a production host. The impact was felt, but protections were in place to minimize this type of event. Hypervisor-specific protections such as high availability (HA) may not be enough protection for this type of event, which luckily was accommodated in the design of the environment by using virtual IP addresses (VIPs) and distributed application environments.

For critical virtualized systems, blending application architecture to virtualization architecture can protect against unexpected outages. By using VIPs, applications such as Web services and terminal servers can be spread around a virtual environment and separated with rules. Further, VIPs are a natural step for distributing an application's workload across data centers.

In my example mistake above, the applications affected were spread across hypervisors and data centers, minimizing the outage. This event reinforced the importance of always building protections beyond HA into virtualized environments.

Have an idea for a future Server Virtualization Advisory Board question? Email Colin Steele, Site Editor.

Dig Deeper on Downtime and data loss in virtualized environments

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.