nikkytok - Fotolia
Disasters happen. Fires, floods and massive storms can disrupt operations. Even times of civil unrest, wars and acts of terrorism can shutter a facility indefinitely. These scenarios make disaster planning a critical part of any IT preparation; it allows a business to recover and keep running in the face of almost any situation. Virtualization has added more flexibility and options to disaster recovery (DR), but it also adds some complexity that IT professionals will need to consider.
In this Q&A, Phil Sweeney, managing editor with TechTarget's Data Center and Virtualization Media group, talks with Steve Bigelow, senior technology editor, about how virtualization has changed disaster recovery trends.
Virtualization has had a big impact on disaster recovery trends, hasn't it?
Bigelow: Yes, it has. Virtualization makes business workloads hardware-agnostic, which can be easier to work with and recover, but takes a lot more thought and planning.
What are some of the biggest problems with recovery in a virtualized data center?
Bigelow: The two recovery problems that jump out are restoration time and staff mistakes. Let's talk about restoration time first.
It takes time to restore a workload from a virtual image in storage to a working server and then restart it. Once a workload is restarted, it uses a part of the available bandwidth. That's fine, but virtualization means a server might host five, 10 or 20 workloads. So every workload that restores and restarts will leave less bandwidth available for subsequent workload restorations. This means restoration will likely get slower as the server fills up.
There are two ways to address this. One approach is to leave each workload quiesced until they are all restored, and then start them up. This should ease bandwidth congestion, but might introduce a little more latency in getting important workloads restarted because everything will have to wait until the server is fully repopulated before the workloads are activated. The alternative is to prioritize your restorations so the most important workloads are restored and activated first, and the less important workloads can afford to take a little longer to restore.
The second issue is staff knowledge and practice; people need to know the process of restoration and be comfortable with the tools involved. It's easy to schedule backups and snapshots to protect data, but there usually isn't much opportunity to practice restorations in traditional physical environments because nobody wants to disrupt production by trying to restore a backed up application and hope it works right. Virtualized environments fix this with flexibility. It's easy to restore workloads to target servers in a lab or test environment without touching the production environment at all. It's a lot easier to keep an IT staff trained and sharp. Restorations will be quicker with fewer errors.
How has virtualization changed DR approaches? Is there a preferred approach today?
Bigelow: I think the biggest change that I've noticed in disaster recovery trends is the use of snapshots -- particularly incremental snapshots -- to preserve data.
The traditional DR approach was tape. Everything was backed up to tape, and then everything had to be restored from the tape. Tape backup and restoration took a long time, so this meant recovery point objectives (RPOs) and recovery time objectives (RTOs) were very long. There was a high risk of substantial data loss, and workloads were unavailable for a long time, so this was very disruptive for the business. Tape also posed the problem of physically relocating and retrieving the media. You don't want to store tapes in the same facility that might burn down.
Backups and restoration from disk was a lot faster, but it still relied on time-consuming backup and restoration strategies. Disk-based backups also weren't portable, so early efforts with disk-based DR systems relied on limited wide-area network (WAN) bandwidth to replicate data to remote storage systems. Disk-based DR was a huge improvement over tape, but recovery still took a lot of time.
In a virtual environment, every workload can be protected by real-time snapshots that capture the state of each VM [virtual machine] at that moment in time. A snapshot is also very fast -- most workloads don't even need to be quiesced -- so the VM state can be captured and replicated to remote storage systems with today's high-bandwidth WAN links and WAN accelerators. This means VMs can be protected with much shorter RPOs [recovery point objectives] and have shorter RTO [recovery time objective] expectations.
With incremental snapshots, it's possible to capture just the changes that occurred to a VM since the last capture. The result is even faster snapshots and extremely short RPOs, and smaller changes use less storage -- you're only saving the changes from capture to capture rather than making entirely new captures. Less storage means there is far less raw data to replicate and synchronize between remote DR sites.
Snapshot technologies also allow organizations to prioritize their data protection scheme. So we're not in a situation where everything has to be backed up the same way every time. Full and incremental snapshots let organizations tailor data protection to the importance of the workload. A critical VM might receive almost continuous incremental backups that are replicated to two remote sites, while less critical VMs might only receive occasional snapshots.
So with virtualization, is it possible to scrap traditional data backup and restoration entirely?
Bigelow: There are data protection technologies that make that possible, but scrapping a tried-and-true data protection scheme might not be the right choice for everyone. Virtualization adds flexibility -- it adds choices -- and a business will get [the] best results when the choice is matched with the need.
One big example is VM redundancy tools like Stratus Technologies everRun Enterprise. This software basically synchronizes two independent copies of a VM installed on two different physical servers. If one version of the VM fails, the duplicate steps in and continues working without interruption. This keeps the workload available and gives technicians time to resolve problems with the other copy or its underlying hardware. It's not clustering in the strictest sense, but it improves workload availability within one data center or across remote sites for DR and business continuance.
But you can understand that this kind of technology takes a bigger investment in hardware -- you wind up running more iterations of each VM, and this takes more physical servers -- not to mention, the cost of the availability software and the additional operating system and workload licenses. It's a strategy that might be just right for mission-critical workloads like transactional databases, but it might not make any sense for non-critical workloads that might get just enough protection from ordinary snapshots.
The answer here is to match the technology to the need, but it's unwise to protect every workload the same way.
So with all of these changes, how is virtualization simplifying disaster recovery trends and business continuance?
Bigelow: The benefit here isn't really simplification. Virtualization isn't really making things simpler, but it adds a lot of choices for data protection and disaster planning, so the benefit here is a bigger toolset -- a business with virtualization can select several different strategies to protect their workloads and data. It's not a single approach anymore like some monolithic tape backup.
We talked about a few of these new options, like full and incremental snapshots. We also talked about redundant VM tools like everRun for workload resiliency. Both of these approaches can be tailored to the specific needs of your business, available network bandwidth, storage resources, and the relative importance of each workload.
But virtualization allows other capabilities that can pre-empt disasters. For example, virtualization provides live migration so that workloads can be moved between servers. This can help with workload balancing to improve server performance, but it also lets IT staff offload VMs from stressed or troubled machines before they fail. The concept of scheduled downtime or planned disruption is basically a thing of the past. Businesses can repair or upgrade systems as needed without ever interfering with a workload's availability or move workloads to remote sites where user demand for that workload is higher.
So, the overall thought process is changing from "recovery" to "resilience," and [many are] using virtualization technologies to continue working through adversity.