Protecting a virtual environment is no simple matter. As virtual infrastructures become more complex and more companies turn to multiple hypervisors, understanding what you need to protect your data center can be difficult. Luckily, today's companies have more choices than ever when it comes to disaster recovery products. Companies should start by weighing disaster recovery options and choosing an approach that fits their needs.
This month, we ask our Advisory Board members what they think is the best way to protect virtual machines in disaster recovery scenarios and how they would advise others on how to choose a product or approach.
Rob McShinsky, Dartmouth-Hitchcock Medical Center
Yes, simple backups can provide a level of disaster recovery, but it quickly becomes cumbersome at scale. VM replication technologies will be the best bet as your number of VMs increase. What will determine your choice of replication technology? Your company's tolerance for downtime and the cost of the product to guarantee this service level. Reaching a clear definition of what downtime means to your company -- and how much is acceptable -- will be the critical component management needs to address when defining realistic expectations of technical personnel in the event of a disaster.
Most major SAN vendors provide synchronous and asynchronous replication scenarios for virtualized environments. The two that I am familiar with are HP and EMC. These tend to be the costly options, but you may already be licensing these for your critical physical infrastructure needs.
Software-based VM replication is where I am seeing most companies focusing for DR. Both Microsoft (Hyper-V Replica) and VMware (vSphere Replication) have built asynchronous replication products into their hypervisors that allow you to replicate application consistent copies of your VMs to an alternate site. The Hyper-V Replica core can also be augmented with a Windows Azure disaster recovery coordinator called Hyper-V Recovery Manager.
With more companies using multiple hypervisors, a number of vendors can replicate VMs from multiple hypervisors to give you a common look and feel. Vision Solutions' Double-Take Availability product and Veeam Backup and Replication are two of the more well known.
For larger organizations, multiple VM replication options will probably be the choice because your VMs will run the gamut of importance to the company. Constantly defining the criticality of your company's VMs will define where they fall into the different replication product capabilities you choose.
Brian Kirsch, Milwaukee Area Technical College
There are so many disaster recover options today that can perform real-time replication, virtual machine failover and a host of other features. Many of these products provide you the shortest amount of outage or, in some situations, zero outage. The biggest challenge is deciding what to choose. Depending on the vendor you select, it might be a storage, virtual machine or network-focused approach. We know that all of these pieces are needed to make the modern software-defined data center function, but does anyone truly know all of the relationships that all of the pieces in your data center have with each other?
Having been in a few large-scale outages myself, due to a natural disaster and someone running into an ill-placed power shunt, the biggest challenge wasn't the technology of the disaster recovery product, but the lack of understanding of how everything fit together. With so many critical infrastructure pieces now in the virtual environment, such as DNS, DHCP and Active Directory, understanding how these relationships work and interact with each other is becoming more critical. Modern companies should be documenting all aspects of a data center and keeping a hard copy in the event of a serious issue. This type of documentation is difficult to maintain with rapidly changing environments and becomes even harder to incorporate when you consider the challenge of personal changes.
Before a company chooses which disaster recovery product will fit best, it may need to start by finding out what pieces it needs to protect. Simply asking your staff won't cut it.
One of the products showcased at VMworld 2013 was IT Continuity Architect from Neverfail, but this product wasn't designed to provide disaster recovery features. It documents the relationships between your business applications and works with existing disaster recovery products, such as VMware High Availability, Fault Tolerance, Site Recovery Manager, vSphere Replication and Microsoft Clustering to give you a detailed look into your data center. Neverfail provides that deep look into which servers and applications are talking to each other and whether it is possible to adhere to the service level and protection tier that you and your clients need.
Jason Helmick, Concentrated Technology LLC
What is the best way to protect your VMs in disaster recovery scenarios?
Without getting vendor specific, disaster recovery has come a long way in recent years. The old tactic of backup to disk was really never a good plan, it was just the only plan. The data lost from the time of failure to when the backup was made was often as much of the problem as the failure itself.
The goal has been to get a backup closer to the failure point. You could gradually find features in both your virtualization platform and hardware over the years to help with this goal, but there is still a gap.
The answer to great disaster recovery today is through replication technologies. The idea is to replicate -- in real time -- your VMs to another site location. Under this method, the DR site maintains a near-perfect copy of the production VM. Most of the replication models use tightly-controlled commit processes to ensure that both the master and replica VMs stay current. For the IT pro, this is an easy and effective tactic to a larger strategic problem of disaster recovery.
Not all DR cases require the use of a replication strategy, or in fact, any tactical solution provided by the virtualization platform. Keep in mind that many products, such as Microsoft Exchange with database availability groups, have high-availability features that do not require the VMs to be replicated or backed up. In fact, it would actually be a waste of time in the case of Exchange and many of these products also have their own disaster recovery capabilities.
It all boils down to the IT pro understanding the needs of the services and applications when it comes to disaster recovery. Sometimes the application itself provides everything; sometimes it's up to the platform -- such as replication to a DR site -- to solve the problem. The key in building a good strategy is to have multiple tools in your toolbox and to know when to use them.
Dave Sobel, Level Platforms Inc.
Disaster recovery in the virtualization world shares many of the same themes as in the physical world. Understanding your recovery point objective (RPO) and recovery time objective (RTO) is the first key decision point. How much do you have to recover and how quickly do you have to recover it? Managing data restores of individual files and folders is still as important as complete DR -- you need a plan to restore a single Word file or email rapidly.
First, focus on restores from within the VM itself, generally done with a file-based backup approach. Once the basics of backup are taken care of, you can move on to understanding how to keep backups of the VMs themselves. Beyond that, develop a plan to use failover technologies to keep VMs up and running in the event of disaster. Shorter RPOs and RTOs will generally drive cost up, so understanding your cost of downtime will help reconcile cost against the benefit.
What you need to know about Azure disaster recovery and restoring VMs