Resource management in a virtual datacenter can be complex. It takes delicacy and skill to make virtual machines (VMs) always available and ensure fast disaster recovery and reliable failover of faulty guests, hosts and other pieces in the environment.
My virtual infrastructure series can help IT managers add VM management to their physical server management skillset. I cover the new technologies in virtualization and discuss the lack of effective solutions in some areas. In part one of this installment, I cover VM backup processes. In part two, I look at failover and clustering in virtualized environments.
In a virtual datacenter, backup strategies can mimic traditional ones, wherein you install a backup agent inside every guest operating system and elsewhere copy files, partitions or the whole virtual disk.
This approach works well, but has a painful downside when applied to virtual environments because each VM uses the same I/O channel offered by the host OS (operating system). So, if multiple VMs start their backup at the same time, an I/O bottleneck is unavoidable.
To prevent this traffic jam, administrators should carefully plan a wave of backups, putting enough delay between them so that guest OSes never overlap during these intensive operations.
Unfortunately, this method is not scalable. You cannot avoid backup overlapping when there are more than a few VMs because it can take hours to perform a backup when each virtual disk could possibly hold as much as 20GB of data, depending on applications requirements.
Guest OS backup also obliges administrators, at restore time, to first recreate empty virtual machines and then boot a bare-metal recovery CD inside them.
A potentially risky option
An alternative approach consists in performing backup for guest OSes at the host level in a transparent way.
Since VMs are self-contained in single files dwelling on the host OS file system, much like a spreadsheet or a picture, virtualization newcomers may think backup is easier. Instead, it's much more difficult.
First of all, VMs are considered just like open files locked by a process or an application. (Think about a .PST mail archive hold by Microsoft Outlook). These files must be accessed in special ways, freezing an image of their state (what we usually call snapshot) and performing the backup of it.
This task can be accomplished only if the backup software knows how to handle open files, even if in some cases it can be helped by the host OS. For example Microsoft Windows Server 2003 offers a feature called Volume Shadow Service (VSS), which can be invoked by third party solutions to perform snapshots.
Even knowing how to handle open files, we still have to face another challenge to perform a live backup: virtual machines are not just open files but complete operating systems accessing a complete set of virtual hardware.
Each time a state snapshot is taken everything freezes, including virtual memory and interrupts. This translates in the virtual world as a power failure, which may or may not corrupt guest file system structure.
This is an approach few vendors support, even if a robust OS that does not corrupt data on power failures is in place. One product taking this tact is vizioncore, which built its popularity on esxRanger, a product able to perform VM live backup on VMware ESX Server with a fair amount of automation.
For those brave enough to try such scenario even without support, there is the famous VMBK script, made by Massimiliano Daneri, which performs a basic live backup for VMware ESX Server virtual machines as well.
Microsoft will offer this kind of support for its Virtual Server 2005, starting from the eminent Service Pack 1, but will not allow use of standard Microsoft Backup for the task.
The backup road most travelled
The generally accepted VM approach, and the only really endorsed by virtualization vendors to workaround the hot shutdown issue, is to suspend or shutdown running VMs, perform the backup and resume or restart them. Unfortunately, this process in contradictory to offering highly-available services and obliges administrators to use traditional agent-based backup approaches for mission critical VMs.
Live backup problems will be eventually addressed when operating systems become more virtualization-friendly, but it is worth noting that even this second approach puts stress on host I/O channels.
To completely eliminate the problem, we have to move the backup point from host to storage facility, where our VMs' files can be manipulated without impacting the virtualization platform directly.
VMware is the first to use this solution, but today its Consolidated Backup (VCB) product has notable limitations: it works only with ESX Server; only acts as a proxy for real third-party backup solutions (obliging customers to configure and install different scripts for different products); and it's not able to perform a restore.
Staying at the storage level, there's a different backup method: using storage area network (SAN) management software and invoking LUNs cloning. This approach usually doesn't provide enough granularity, since the storage facility doesn't natively recognize formatted LUNs and therefore cannot offer single virtual machines backups.
Recognition of LUN format depends on the storage management software we bought, and which file systems are supported. It may recognize NTFS-formatted LUNs, allowing us to backup VMware Server for Windows virtual machines, while it could not support VMFS, preventing us from doing the same with VMware ESX Server virtual machines.
If the LUN format is unrecognized or we don't have any enhanced storage management solution, we'll have to clone the whole LUN, which contains several virtual machines, obliging us to restore all of them even if just one is needed.
So, that's the current state of VM backup, as I see it. I welcome any comments on my analysis of the situation. You can write to me via SearchServerVirtualization.com at firstname.lastname@example.org.
Now, let's move on to part two and more challenging areas in virtualization, failover and clustering.
About the author: Alessandro Perilli is a recognized IT security and virtualization technology analyst. He is CISSP certified and is also certified in Check Point, Cisco, Citrix, CompTIA, Microsoft, and Prosoft. In 2006 he received the Microsoft Most Valuable Professional (MVP) award for security technologies. Perilli pioneered modern virtualization evangelism, and is the founder of the well-known blog virtualization.info. Alessandro Perilli is also the founder of the False Negatives project, a high quality IT security consulting and training business in Italy.