Virtual machine (VM) sprawl results in poor resource utilization and a badly organized infrastructure. When virtual machines begin to propagate, IT staff can easily lose track of how many virtual machines are running, on which server they are hosted, where their storage is located and, most commonly, what the intended purpose is of the VM. This growing problem may be one of the biggest issues haunting IT staff as server virtualization continues to gain momentum.
Prior to virtual machines, there was a forced cap on the rate at which the server farm could grow. Physical servers had to be purchased, connected and loaded with an OS and applications. Now a virtual server can be acquired almost instantly, comes already connected and the OS and application loads take relatively no time at all.
Problems caused by VM sprawl
Virtual machine growth results in poorly utilized servers sitting on the virtualization host. Even if the deployment of systems in the virtual environment is somehow tracked, most companies have no way to indicate which virtual machines are being used enough to justify the disk space consumed. The problem is that idle does not mean that they will not be consuming any resources. An idle virtual machine, even if it is doing absolutely nothing, is not really idle. Its mere presence consumes CPU cycles, memory resources, disk space, backup process time and bandwidth.
Consideration should not be limited to only completely idle servers. Having a virtual server that only runs a few minor tasks a few times a day should be examined as well. These idle VM processes and whatever application software they are using also consume RAM and disk resources. It would be more cost effective to re-allocate these minor processes and consolidate them onto one or two virtual machines instead.
Another concern is when VM sprawl causes the virtual machine disk images to consume expensive primary storage capacity, typically with Fibre Channel-attached storage-area network (SAN) disks and can cost as much as $20 per GB. This is far too expensive for a VM that is doing next to nothing.
Virtual machine archiving
For the problem of virtual machine sprawl, systematic archiving of virtual machines may offer the best solution. The concept of virtual machine archiving is new but the tools to perform the task are not. Virtual machine archiving takes advantage of the method used to deploy virtual machines: creating what are essentially big files that represent disk images. These disk images can be archived and the archive can be managed similarly to traditional archive data sets.
The rapid adoption of disk-based archiving has made virtual machine archiving even more practical. The tools to move data to a disk target are readily available today and come standard with environments such as VMware. So significant is the issue of virtual machine sprawl that VMware itself is introducing VM Lifecycle Manager, which will also support disk-based archives.
A disk archive is more than going out and buying a cheap RAID (Redundant Array of Independent Disks) or adding a SATA (Serial Advanced Technology Attachment) disk to an existing SAN (storage area Network). It is a unit specifically designed to retain data on disk for long periods of time. These systems provide scalability, cost effectiveness, high availability and security that standard disk solutions cannot.
Particularly effective in virtual machine archiving are sub-file or block-level data deduplication of disk-based archives. By leveraging this technology, a disk-based archive can dramatically minimize the amount of storage used by the archived virtual machines. Data deduplication examines the segments of data as they are sent to the disk archive. These segments are then compared to other segments already stored on the disk. Segments are only stored once, and redundant segments are linked to the segment already on disk. VMware images are highly redundant and, as a result, their storage on a disk archive with data deduplication capabilities will be highly efficient. It is typical to store 20TB of virtual machine archives into 2TB of physical storage disk capacity. This drives the cost of disk-based archives well below the $4 per GB mark, a potential savings of $16 per GB.
Implementing VM archives
There are two main steps in implementing a disk-based archive for virtual machines. The first step is identification. It is worth the effort, even manually, to identify these orphaned virtual systems. However, there are specific tools available from companies like PlateSpin, Tek-Tools and possibly VMware Lifecycle Manager for automating the identification of these systems. The second step is movement of the virtual machine, and a tool is available in the vcbMounter from VMware. Most other virtualization technologies have a similar capability.
VcbMounter is used to export a backup copy of a virtual machine. The program is invoked at the command line or can be part of a script, with vcbMounter indicating the name of the virtual machine to archive and the destination directory for the target archive. This utility is typically used for backup to disk, but it is also ideal for sending the virtual machine image to a disk-based archive. All the files needed to recreate or restore the virtual machine are exported by vcbMounter to produce a complete file-system-consistent backup copy of the virtual machine.
VcbMounter is better suited for backup and DR than other VMware provided utilities such as vmkfstools because it ensures this consistency. The challenge with the vmkfstools utility as it relates to virtual machine archiving is that it only works on virtual disk files; the VMware administrator must manage shutting down the virtual machine or creating a redo log for the virtual disk. It is not as effective as vcbMounter for virtual machine archiving.
With vcbMounter, if the archived virtual machine is needed again it can be very quickly recovered from the disk archive by using the VMware tool vcbRestorer. This will import a copy of a virtual machine created by vcbMounter and can be used to completely recover a virtual machine to its original state on the original or an alternate ESX server host.
The speedy response of a disk archive allows for more aggressive archiving of virtual machines because images that are years old can be recovered in mere minutes. As a result, limited risk is taken in archiving an infrequently used virtual machine. The speed of archiving and recovery further expands the use beyond virtual machines that have gone permanently idle to machines that are only needed on a seasonal or peak basis. Once the seasonal need for those systems has come and gone, they can be archived to less expensive storage designed for retention. When the peak season returns, that image can be quickly found and recovered.
The disk archive can also be expanded well beyond the use of virtual machine archiving. It can be used for archiving of files, image data, email archiving and even database archiving. If the same disk archive platform is used throughout, the data deduplication technology will be leveraged across all uses and the net storage growth will be minimized since redundant data will be removed regardless if it came from a virtual machine archive or a more traditional file archive process.
To sum up, virtual machine archiving is an effective and easy to implement method of controlling virtual machine growth. It can reduce the cost associated with managing too many virtual machines and reduce the consumption rate of the primary storage that the virtual environment requires.
About the author: George Crump is President and Founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland he was CTO at one the nations largest storage integrators where he was in charge of technology testing, integration and product selection.