In most cases, virtualization problems will result in some data loss. It may only be a few transactions, but even small virtualization problems can turn into emergencies.
Todd Erickson, chief operations officer and senior vice president of First Flight Federal Credit Union, a financial institution in Havelock, N.C., recalled connectivity trouble involving a 110 TB EMC Symmetrix.
"During some routine maintenance at 3:30 in the afternoon, about 500 servers lost connectivity to the storage device for about 30 seconds," he said. "That's near catastrophic."
How to prevent data loss
Given the potential for serious production disruptions, the best way to deal with virtualization problems is to do everything possible to prevent data loss and downtime in the first place. This starts with proof-of-principle testing for every workload long before an actual deployment or change is made.
Test each application under conditions that approximate a live environment as closely as possible. Measure the workload's computing demands, plan its physical host location and know where the VM can be migrated or restarted in advance.
"A [quality assurance] environment will definitely be necessary even if you couldn't duplicate the virtual SAN that you have on the back end," said Ty Hacker, director of technical services at I-Business Network LLC, a financial/accounting Software as a Service provider located in Marietta, Ga.
Reaching out to vendors and others that use the technology successfully can reveal tactics and best practices that can prevent data loss, downtime and other virtualization problems during a real deployment. Another way to forestall the affect of problems is to use high-availability (HA) techniques for mission-critical workloads.
Using HA to prevent data loss and downtime
This may include virtualized server technologies that host redundant VMs on two or more physical servers, using specialized software to synchronize both instances so that one can take over for the other when a server disruption occurs. For less critical workloads, use techniques that will fail over workloads to prescribed servers or even spin up failed workloads from storage.
Administrators should test their HA deployments on a regular basis to be certain that the processes work as expected, especially if any attribute of the data center has changed. For example, consider a cluster of three active servers providing added performance for a mission-critical application. Test the deployment to ensure that two servers will provide the necessary level of performance if a third fails. This may mean pulling a network cable or pressing a power button to simulate a serious issue.
Redundant SAN access should be an integral part of any virtual data center and should be supported aggressively with snapshot and replication processes. But don't overlook the importance of local storage on the servers themselves.
"I know a lot of virtualization [deployments] will literally boot straight from a SAN, which is never supposed to have a problem -- but it does," Erickson said.
Either having local disks available or running mirrored local drives can be useful, he added.
Virtualization: The point of no return
There's just no practical way to undo the abstraction that virtualization introduces -- short of restoring a pre-virtualization backup. Organizations protect themselves from deployment and expansion problems by starting small, gaining expertise by virtualizing noncritical workloads and then systematically extending the technology to more important workloads.
Ultimately, the decision to deploy virtualization is now a point of no return for most organizations.
"In my environment and a couple of other places I've worked, there's no way we would ever consider going back to physical under any circumstances," Erickson said.
So rather than dumping virtualization when a problem occurs, it is important to develop solid troubleshooting methodologies using management tools that provide a level of insight and control that is appropriate for your IT needs.
"Be very familiar with the command line," Hacker said. "Have tools that will allow you to attach to the host directly versus a Web-based or client management console."
Much of the functionality necessary to address serious host problems under environments like Citrix or VMware is not available through a streamlined GUI, he said.
Although vendors can be a resource to help resolve virtualization problems, experts warn against over-reliance on them.
"Your technology team really needs to understand what the moving parts do," Erickson said. "The virtualization vendor doesn't care about your business and your data like you do."
IT personnel should be the "first responders" to any virtualization issue because they can solve relatively minor problems quickly. They are also able to communicate more substantial problems to vendors faster and more effectively than if they waited for an outside technician to arrive to resolve the issue.
In some cases, support from a third-party provider or VAR can be more effective than dealing directly with the vendor, particularly if the VAR had a hand in deploying your virtualization in the first place. The VAR can always reach out to vendors if the need arises.
About the author
Stephen J. Bigelow, a senior technology writer in the Data Center and Virtualization
Media Group at TechTarget Inc., has more than 15 years of technical writing experience in the PC/technology industry. He holds a bachelor of science in electrical engineering, along with CompTIA A+, Network+, Security+ and Server+ certifications, and has written hundreds of articles and more than 15 feature books on computer troubleshooting, including Bigelow's PC Hardware Desk Reference and Bigelow's PC Hardware Annoyances. Contact him at firstname.lastname@example.org.
This was first published in September 2010