Increasing concerns about data protection in virtual infrastructures are driving the growth of off-site or remote storage. Virtualization is a natural fit for off-site storage -- as long as you pay attention to a few key aspects.
The central issue with any off-site storage is ensuring that new or changed data at the primary virtualization storage site can be moved to or from the remote site within the constraints of a prevailing service-level agreement. This issue with virtualization storage brings the classic discussion of bandwidth and synchronization needs into play.
Bandwidth and replication challenges
Available bandwidth should meet reasonable data-transfer needs for virtualization storage. If 100 GB virtual machine (VM) snapshots need to be moved to the remote storage site, for example, there should be enough bandwidth to handle the transfers at a pace that meets agreed-upon service levels. If a new snapshot is taken before the last one has transferred, there is no way to keep the off-site storage site up to date.
Synchronizing data sets is another issue when it comes to remote storage. Synchronous replication is a powerful way to maintain updated copies, but it is subject to distance limitations because of latency. More bandwidth does not overcome latency if the virtualization storage endpoints are too far apart.
With synchronous systems, when a write event occurs at the protected site, that command and corresponding data must be sent to the recovery site. That command and corresponding data must be processed by the array at the disaster recovery (DR) location, and then an acknowledgement has to be sent back to the array at the production locale. Then and only then can the write event be deemed as successfully completed, and the operating system in the VM is notified that the write was successful.
Many operating systems have internal settings for disk timeouts so that if the threshold is exceeded, the OS and applications will consider that disk event to be a failed write.
Asynchronous replication eases this off-site storage problem, but data copies can fall out of sync by 15 minutes or more. WAN acceleration products can offset latency and enhance the effective bandwidth. Many organizations, though, opt for a two-tiered data-protection strategy with virtualization storage, which involves the creation of snapshots to local storage. Those snapshots are then copied to off-site storage asynchronously as bandwidth allows.
Reducing the total data set by omitting nonessential content is another solution to these kinds of remote storage problems. For instance, an organization may replicate only 10 mission-critical VMs instead of all 50 VMs. This creates less content and speeds replication. This might require the movement of VMs out of the logical unit numbers/volumes where non-mission-critical VMs are located. By relocating these noncritical VMs from replicated LUNs/volumes, you can save bandwidth and use less space at the DR location when they are replicated to the recovery site.
If you find that the replication process for remote storage is a space hog, it's also worth reconsidering your policies for placing VMs into data stores. Staffers may create virtual machines in an uncontrolled fashion that results in the VMs being omitted from your replication plans or machines that have no need to be replicated but are gobbling up precious bandwidth. The parts of the VM should be broken up to prevent ancillary files from being rolled into your replication plans. Examples include creating a virtual disk specifically for holding log files or guest OS page files, and relocating the VM swap file to volumes or LUNs not marked for replication.
Other off-site storage roadblocks
Virtualization also presents two other concerns for off-site storage. Virtualization allows replicated data to exist on diverse storage hardware. SAN content can be replicated to off-site storage such as SATA storage systems, for example. Although this virtualization storage method saves an organization a great deal of money, it's a potential hurdle if the remote storage site will perform actual work, such as a "warm" site.
Virtualization storage administrators must select off-site storage that fits with the site's intended purpose. A warm remote site that takes over to service users will generally need servers and storage with performance characteristics similar to those of the main data center.
"A lot of [data] centers take their older hardware and move it from the primary center to the remote center," Silverton Consulting's Lucchesi said. "If you actually ever have to use [hardware] at the secondary center, it still has to support your primary workload."
Regardless of the methodology and technologies used to move data to a remote storage site, every organization needs a proven process for bringing data back (or failing back) to the main data center. If the remote data is only a backup, a restoration processor recovery time objective should meet the agreed service level. This problem with virtualization storage is more complicated for an active site because that site must remain active while the data is copied to the main data center. Then it must be reconciled with existing data.
"Once we fail over to the DR site, failing that data back may be difficult," said Scott Gorcester, president of Moose Logic, an IT managed services provider in Bothell, Wash. "We may be able to get a replica of the data and then trickle changes over time, but once we end up with a terabyte or two of data over there that has changes, we have to consider an effective strategy for bringing all that data back into the primary data center."
Virtualization storage brings a new level of versatility to the enterprise, but administrators must exercise care in its use.
When evaluating your virtualization storage method, establish a manageable process for recovery that can be monitored. Remote storage levels should be monitored regularly and adjusted to accommodate actual application use. Off-site storage should also be evaluated for recovery performance. Then test that process frequently to verify that it works sufficiently and that the IT pros tasked with managing it are proficient in its use.
This was first published in August 2011