VM errors: The case of the disappearing virtual machine

The root cause of a VM error can be difficult to pinpoint. Here's the tale of a randomly disappearing VM, and how to prevent this VM error in your environment.

Here's a story about a virtual machine (VM) problem that I overheard at an IT conference. Tell me if you've experienced...

this VM error before …

An IT organization made the jump to virtualization and successfully converted dozens of physical machines into virtual ones. It purchased more than a handful of virtual hosts, which provided the horsepower for its VMs. Then came the high-availability features so that VMs could fail over to other hosts in case of a problem, along with monitoring and load-balancing technologies to ensure the optimal distribution of resources.

With money left over, the company purchased additional hosts for future expansion: a smart move. In short, this organization was good.

More on virtual machine storage
Mounting VHD files in Hyper-V environments  

Understanding VHD (virtual hard disk) options  

Understanding the Virtual Machine Disk (VMDK) file format

Disappearing virtual machines
But while all the pieces were correctly integrated and the VMs hummed along merrily, a VM would occasionally just disappear.

Now, this VM didn't completely disappear. It was still available in the platform's management console. Sometimes, however, the entire machine would drop offline. In other cases, this odd VM error caused functionality to fail.

The cluster settings, or logs, offered no solutions. Troubleshooting the VM error messages led to more dead ends as well.

What caused the VM error?

The answer to this VM error is in the storage -- specifically, in a part of the storage that has little to do with virtualization. If you've read my recent article on encapsulating Virtual Hard Disk data, you know that there's more than one way to present disk storage to VMs.

The simplest method requires the creation of another Virtual Hard Disk or Virtual Machine Disk inside a logical unit number (LUN) that's already exposed to a Hyper-V or ESX host. Adding an encapsulated disk file to the same LUN pretty much guarantees that a VM's second disk is always available if its first disk is around.

In some cases, though, it doesn't make sense to encapsulate secondary storage for VMs. Consider a file or Exchange server. Depending on your virtual platform and its version, it may make more sense to store data on pass-through disks or Raw Device Mapping.

After creating a secondary disk, you must logically separate it from the primary system disk on your storage area network. You must also expose the secondary disk to every possible location where the primary system disk -- and its associated VM -- might get hosted.

This leads to the cause of the mysteriously disappearing VM. The virtual infrastructure worked correctly; but when it migrated VMs to new hosts, it occasionally relocated a VM to a host that did not have access to a VM's secondary disk. In other cases, during the rebalancing process, a VM would migrate to a location that did have access -- which would explain its disappearance and reappearance.

Perhaps a storage administrator didn't mask and zone the secondary disk to the virtual host? Maybe someone just simply forgot that the LUN needed access?

If you're lucky, your virtual platform includes pre-migration verifications to ensure that this VM error doesn't take place. But nothing beats correctly architecting servers-to-storage connections up-front

The moral of this story: Be careful with your disk connections in virtual environments. Always double-check that every disk is properly exposed to any virtual host that may house orphaned VMs. Otherwise, you may discover your own highly available -- yet unavailable -- VM.

Greg Shields

Greg Shields is an independent author, instructor, Microsoft MVP and IT consultant based in Denver. He is a co-founder of Concentrated Technology LLC and has nearly 15 years of experience in IT architecture and enterprise administration. Shields specializes in Microsoft administration, systems management and monitoring, and virtualization. He is the author of several books, including Windows Server 2008: What's New/What's Changed, available from Sapien Press.

Dig Deeper on Disaster recovery, failover and high availability for virtual servers