Saying that virtualization makes disaster recovery (DR) less complex is easy. It does. But using virtualization to eliminate traditional stumbling blocks isn't. A well-devised disaster recovery strategy in a virtual environment means you need a plan that exploits virtualization's hardware transparency, system mobility and automation benefits.
Organizations' disaster recovery plans and tests vary. Some IT shops fully test DR plans on a quarterly basis by validating all steps in a plan, while for others the test consists of reading the plan but not acting on it.
DR can be a complex undertaking, and when you face differing hardware between production and recovery sites, recovering critical systems is often fraught with numerous roadblocks. Most of these problems result from differing hardware and, hence, differing firmware and device driver requirements for production versus disaster recovery sites.
Why do some organizations restrict disaster recovery testing? Here's my favorite reply -- and one that I've heard more than once: "Because they are bad for morale."
Why do IT shops believe disaster recovery testing is bad for morale? Often a DR test takes days to complete and can set an IT staff pretty far back in terms of its "real work."
But beware of the time-sink excuse. DR is a serious issue, and any organization that doesn't have confidence in its disaster recovery plan is taking a serious gamble. That's why virtualization is so important. The hardware abstraction of virtualization platforms makes it relatively easy to overcome differences in hardware between production and recovery sites.
Virtualization reduces hardware needs
Virtualization changes how we plan for and execute a disaster recovery strategy. With virtualization, you can reduce the number of disaster recovery site hardware requirements -- assuming that you're preparing for the loss of only a single site. The hardware abstraction afforded by virtualization removes the need for duplicate hardware between production sites and the recovery site.
When it comes to disaster recovery, data center managers have plenty to worry about. For organizations that virtualize production resources, they will likely need to adjust disaster recovery plans after a server virtualization migration. How does virtualization change things? What should you do differently? The six items below are considered requirements for disaster recovery in virtual environments:
- Hypervisor configuration settings
- VM configuration settings
- Shared I/O
- Virtual machine (VM) snapshots
- Data staging at the DR facility
- Disaster recovery automation
Hypervisor configuration settings
Hypervisors such as VMware ESX Server and Citrix Systems Inc.'s XenServer store configuration settings locally on their physical host system. Configuration settings determine how the hypervisor accesses compute, storage and network resources and also dictate how shared resources are presented to VMs.
In virtual environments, two of the most critical configuration settings are the virtual network and storage configurations. Many organizations have scripts in place that can recreate critical hypervisor configuration settings; however, in the absence of configuration scripts, you need to ensure that all configuration settings are regularly backed up. There are two common techniques for doing so:
- Installing a backup agent on the hypervisor console.
- Exporting the hypervisor configuration settings to a network share (such as an Network File System mount) and then backing up the settings from the share.
How you back up copies of the hypervisor settings isn't important; backing them up on a regular basis is.
Virtual machine configuration settings
VM configuration settings define a VM's virtual hardware settings and include the following:
- Virtual network interface settings such as MAC addresses
- Virtual switch association settings
- Storage configuration
- Virtual CPU configuration
Many organizations back up VMs by installing backup agents inside each virtual machine's guest operating system. While backup agents inside a VM's guest OS secure copies of the VM's data quite well, agents won't back up the configuration data that is external to the VM's guest OS. VM configuration data is stored in each VM's associated configuration file (such as a .vmx file with VMware or a .vmc file with Microsoft).
Again, back up VM configuration files on a regular basis, which can be done with the same methods you use to back up hypervisor configuration settings.
Virtual machine snapshots
Most virtualization platforms include features that enable you to create live snapshots of running virtual machines. Depending on your storage architecture, live snapshot may be integral to your backup and recovery processes.
VM snapshots should be considered a requirement for DR preparedness and should be a part of your change control processes. So each time a change occurs to a VM -- as with a configuration change, patch installation or software installation -- you should create a new snapshot of the VM and immediately replicate it to your DR site. VM snapshots should serve as the baseline for all disaster recovery operations. Since at a minimum the snapshot includes the VM's most recent OS and application configuration, you need to restore only the most recent data files from backup to fully recover the VM. Of course, if you're using asynchronous replication to synchronize the data between both production and DR sites, then you just have to power on each VM at your recovery site and you're all set. (For additional information on using a storage architecture to capture VM snapshots and using replication for DR preparedness see my SearchStorage.com article on using storage replication for virtual machine disaster recovery preparedness.
Data staging at a disaster recovery site
DR preparation involves more than a documented plan; you need to validate that a DR site has all the prerequisites for a successful recovery. At a minimum, successful recovery of a DR facility requires the following:
- Hardware resources to support re-staging or failover of the production site
- System-by-system hardware and software inventory
- System-by-system firmware inventory
- Backup media
- Data protection, OS, virtualization, and application software
- Recovery procedures
- Detailed network diagrams
In traditional physical server disaster recovery, detailed system configuration information -- hardware requirements, storage requirements, partition configurations, etc. -- is crucial in order to fully rebuild production systems. In VM recovery, VM configuration details are secured by backing up each VM's configuration files. So recovering VMs at the DR facility is much easier than recovering a physical system, especially if the hardware at the DR facility is not the same as the hardware at the production site. (For more on VM staging at a DR facility, read my SearchServerVirtualization.com article.)
Firmware documentation is an often-overlooked element of DR preparedness. Any firmware updates at a production site should also be applied to devices at the recovery site. Otherwise, differences in device drivers and firmware revisions between production and DR facilities may prevent physical host systems from successfully starting in the event of a disaster.
You'll want to ensure that backup media, all necessary software, detailed recovery procedures and detailed network diagrams are available at the recovery site in order for the DR facility's local staff to more easily troubleshoot any problems that occur as VMs are brought online.
Disaster recovery automation
One of this year's major themes is data center automation, and automation tools have extended to disaster recovery procedures. If you manage VMware-based VMs, keep an eye on VMware Site Recovery Manager (SRM). With SRM, you can automate your disaster recovery plan with software, initiate that plan with a mouse click, and pre-program the sequence in which VMs are brought online at a disaster recovery site. During the course of this year, I expect other vendors to offer similar technologies as well.
Virtualization really transforms how we look at disaster recovery and requires substantial modifications to a DR plan to reap the various benefits of virtualization. With tools such as PlateSpin Ltd.'s PowerConvert, you can even use virtualization to stage recovery VMs for physical production systems that have yet to be virtualized.
As a long-term strategy, use virtualization to provide a fully automated and easily testable disaster recovery plan. And soon, hopefully all organizations that abandoned DR testing because of its demoralizing effect on IT will return to the fold and fully test a DR plan. If DR falls into your area of responsibility, knowing that your DR plan actually works should allow you to sleep better at night.
Chris Wolf is a senior analyst at Midvale, Utah-based Burton Group and the author of several IT books. Check out a chapter on backup from Wolf's book, Virtualization: From the Desktop to the Enterprise. In a recent SearchServerVirtualization.com podcast, Wolf shared additional tips on best practices and tools for virtualization-based disaster recovery.