With all of the options admins have today, sometimes, it can be easy to forget about virtual server backups, especially...
with replication being used in the data center. However, replication doesn't cover everything. The primary purpose of a backup is to create a copy of important data that isn't online to hackers, or in case you run into software issues or those occasional mistakes that system admins create. Clouds and virtualized environments bring their own challenges when it comes to performing backups. VMs are transient and data is in constant motion. System admins need a virtual server backup strategy in place to ensure that every backup is handled correctly.
The focus in handling virtual systems is to impose a data management structure. It's important to figure out what data needs to be saved and where the primary copy exists. This information has to be overlaid with backup frequency based on recovery point objective policies, which will likely differ from data set to data set. Here's where replication does have an impact. If it's done properly, with geodiversity across multiple zones, a bunch of failure mechanisms are negated, such as hardware or power problems.
Geodiversity and snapshots as part of a virtual server backup strategy
Geodiversity, coupled with frequent snapshots, provides a good level of availability and data integrity. The snapshots limit exposure to software events, while providing controlled rollback. Hacking issues, unfortunately, have a different picture, since a hack may go unnoticed for days or even weeks, as some recent, high-profile cases have demonstrated. Of course, geodiversity may not be an easy option with purely private clouds or virtual server clusters, which tends to re-emphasize backup as a protection vehicle.
The art in handling these issues is to determine when the hack began, giving a baseline data storage picture, and then to understand the data sets impacted by the hack. This is where good backup packages stand out from the pack. In many ways, the measure of good backup software is how powerful the recovery tools are.
The backup strategy in a virtual environment consists of two portions. Data stored in networked storage can be protected at the storage appliance, simplifying protection of data shared by many machines and providing a simpler vehicle for recovery. On the other hand, data files specific to a particular VM need to be treated in much the same way that server files are treated. This is especially the case if local instance storage is provided for the VMs.
Networked storage is best handled by the snapshot, then backup process. This should keep the data at a self-consistent level if recovery is needed. Incremental backup saves on WAN traffic, irrespective of whether data stages to local disk storage prior to being moved to a distant location.
Where data management is less structured, as is the case where many tenants access the VM pool, is networked storage backup, which lacks the visibility to handle the fragmented data map. Here, the best option is to resort to virtual machine backup. There are two options for this: One is to back up a set of selected files on each machine; the alternative is to just back up the whole VM. Often, the latter is the choice, simply because it is easier to set up, manage and, as importantly, easy to restore.
One mistake often made is to assume that VMs behave just like legacy environments. Admins use traditional backup tools, with agents in the VMs, among others. In many cases, these tools are deliberately out of date -- being a couple of versions behind -- so that demonstrated stability is achieved. With the rapid evolution of software in the cloud and virtual environments, this prevents the use of pathways and APIs specifically designed for efficient backup, which can slow down operations dramatically.
There are many tools that support VM backup. The large cloud providers have their own offerings, as do hypervisor vendors. Third-party tools take advantage of the API sets and offer their own approaches, especially in the recovery area.
A final issue is the location of the backup. WAN performance around the world is generally not keeping up with traffic needs, especially in the United States. This is less of an issue with public cloud virtual machines, which have access to local storage pools as a first stop in the backup process, and then can use the inherent geodiversity of these clouds to move the data off-site. The public clouds also have options for archived storage, and recent evolutions by Google, for example, have moved access speeds very close to online data for a fraction of the cost.
For private clouds and simpler virtualized clusters, local backup is the near-term answer, with an unintegrated transfer of data to a public cloud as an option, but the move towards hybrid clouds opens up in-cloud storage, with all its fringe benefits in geodiversity and ease of use. Ultimately, cloud storage has too many benefits to ignore, likely ending the use of local storage mechanisms and tape libraries. These will be replaced by cloud backup gateways, likely themselves running in virtual machines, with backups being cached for a while locally due to some evidence that recent backups account for most restores.
How virtual server backup differs from traditional backup