BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Ensuring uptime and recoverability in the event of a failure is a critical part of any virtualization professional's job. And while, compared with traditional servers, virtualization eases the task of providing business continuity, these chores are still complicated, and disaster recovery strategy varies widely among organizations.
Virtualization creates an abstracted layer between hardware and the operating system and applications, which allows for great flexibility when it comes to recovering systems after a disaster. Many virtualization platforms enable you to automatically migrate and restart virtual machines (VMs) on alternate hardware in case of a failure. However, that solves only part of the business continuity and disaster recovery (BC/DR) problem. Virtualization architects must also establish the most critical applications and data to protect, how much data can be lost and how long applications and data can be unavailable. Once they outline these criteria, they can select the appropriate backup, replication, clustering and automation solutions to achieve those goals.
Virtualization reduces disaster recovery challenges
Virtualization reduces the challenges associated with disaster recovery (DR) by providing hardware independence. When you virtualize servers, they are containerized into VMs, which are independent from the underlying hardware. As a result, you no longer need the same physical servers at primary and secondary data center sites.
Virtualization also provides advanced availability, backup and replication features that aren't an option in a traditional setup. Virtualization platforms typically include features that allow you to move running VMs from host to host, move VM storage from one storage area network (SAN) to another or provide high availability (HA) in the event of a host failure. The blocks that make up VM disk files can be easily tracked, backed up and replicated, as changes are made inside the VM, which improves recoverability.
Unlike traditional physical server technologies, these features apply to all VMs, no matter which OS or applications they run.
Consolidating physical servers with virtualization also provides significant cost savings. Servers and storage at the recovery site often cost less, because virtualized servers can be replicated at significantly lower cost. Finally, with virtualization, DR planning and failover testing is easier, can be done for all servers instead of a few and is available at a significantly lower cost.
Planning a virtual disaster recovery strategy
While virtualization provides disaster recovery benefits over a traditional system, the steps for developing a BC/DR plan for a virtual infrastructure are not new. The core of any DR plan is deciding which systems and data are most critical and then developing a solid disaster recovery strategy to protect them. Here's how to start:
- Get executives' support for the time and expense of creating and testing a DR plan.
- List potential disaster scenarios and categorize them based on scale and impact.
- Document and analyze the current data center infrastructure, including the applications and data running inside (i.e., document all servers, storage, network, applications, power and cooling requirements).
- Take inventory of the current disaster recovery strategy and all dependencies related to the data center (for example, a backup generator may be in place but will run out of fuel within 24 hours).
- Define service-level expectations and contingency plans.
- Establish and test the BC/DR plan.
The approach to DR differs for every company, but critical functions are similar. Most organizations have a few critical applications that are the backbone of the business. Most have systems that they use to communicate (e.g., email, VoIP phone systems, an internal SharePoint site or internal instant messaging). To support those critical functions, companies need a DR plan and tools available if disaster strikes. This could be as simple as a PDF of the plan that identifies the steps to perform and a laptop with remote virtual private network (VPN) access to the DR site for someone in IT.
- RPO: The maximum period of data loss in the event of a disaster. For example, if your company can tolerate 24 hours of data loss, perform backups nightly; your recovery point objective is within the last 24 hours.
- RTO: The maximum time that a company can tolerate unavailable applications. For example, if your RTO is four hours for your email system, your DR plan should be designed such that all email service for the company can be restored within four hours, to meet that objective.
Virtualization can make meeting RPOs and RTOs easier. Critical VMs can be replicated to another data center to prevent data loss. To guard against data corruption, VMs can be backed up every hour, to ensure that multiple data points are available for recovery. And to protect applications from server failure or an OS crash inside a VM, a hypervisor's built-in high availability can be used to automatically bring up VMs on another host (or automatically restart the VM's operating system if it is unresponsive).