Sometimes all it takes is a system setup or configuration oversight to impede a virtual machine migration or restart, and a failed VM migration has a detrimental effect on data center efficiency and availability.
Abstracted from the underlying server hardware, VMs are easy to protect and migrate from one server to another. Virtualization, however, does not guarantee flawless reliability. Take a look at five of the most common
1. Inadequate server resources
To start a VM, you need available computing resources. Insufficient or over-committed resources may cause the VM to fail immediately. This can easily occur on servers with heavily over-committed memory or with excess CPU reservation that doesn't provide the VM with enough resources. Administrators usually see these types of resource problems on heavily consolidated servers or high availability clusters, or when migrating VMs to other highly utilized servers without allowing adequate computing capacity for failover.
A server upgrade would add resources, but performing workload balancing is a better solution. Redistributing one or more VMs between servers will free adequate resources to ensure successful VM startup.
2. Incompatible server hardware
Virtualization abstracts workloads from the underlying hardware, but the hardware must still provide the critical features and functionality those workloads require. Startup failures can occur when you migrate a VM to an older server that may lack hardware features the VM needs.
More resources on VM migration
Virtual machine migration best practices
Common VM monitoring and management errors
Using PowerShell to migrate virtual machines
Suspending a VM uses CPU-specific power management states; if you migrate the VM to a server with CPUs that lack these power management states, the VM will not restart properly. In this case, you may need to manually restart the VM using command line options or migrate the VM to another physical server with similar CPU capabilities and restart the VM there. You would then migrate the running VM to the desired server.
You may also find that CPUs lacking virtualization features, such as Intel VT or AMD-V, or those with the extensions inactivated won't support VMs. Before you migrate or start a VM, verify that the destination server provides virtualization extensions, and be sure to enable those extensions in the BIOS.
IT personnel will need to adjust migration plans over time to ensure the use of compatible server hardware. In some cases, IT pros may be able to edit the VM to remove certain CPU feature requirements.
3. Conflicting VM tasks
Some virtualization-related tasks take a considerable amount of time to complete and continue to run in the background even after generating a timeout error. Attempting to start a VM while another virtualization task is running concurrently may result in a server error. For example, when you delete unneeded snapshots, a VM may not restart during the consolidation process. Adjusting the timeout settings in VM configuration files will allow critical activities to continue uninterrupted, but you may also need to reconfigure background tasks to avoid VM migrations and restarts or to take place during off hours.
4. VM file damage
VMs are little more than images in memory that are saved to disk in the form of specific VM file formats, such as .vmx and .vmdk. As with any disk-based storage, problems with the disk storage subsystem or the network connecting storage and servers can damage VM files. When an essential file needed for a VM is missing, locked, damaged or otherwise unavailable, the VM cannot start up.
File locking, used to prevent concurrent tasks from making unexpected changes to files in use, frequently causes these errors. In some circumstances, a VM component file remains locked and prevents the VM from starting on another server. You can identify a locked file and remove the lock, but this detailed procedure requires expertise on your specific hypervisor and data center environment. It is more common to recover the VM from a recent snapshot or other backup.
5. Licensing, administrative input and other issues
Other issues that prevent VM startups, such as an unexpected pause for user input, have nothing to do with server capabilities or file integrity. The VM may begin a normal and successful startup, but pause before completion, waiting for administrative input. Once you answer the question, VM startup continues. By reconfiguring the VM to automatically answer routine questions and eliminate manual intervention, you'll sidestep the pause in VM startup. This process varies among hypervisors and requires administrative expertise.
Licensing is also often frequently overlooked, but problems usually result from hypervisor installation oversight. For example, when a VM is deployed on a newly virtualized server, the new server's license source may not be configured properly. In other cases, the license may be damaged or corrupted, or the organization may simply be out of licenses. As such, you must ensure you have enough licenses to support the virtualized servers in use, verify that the license is configured for the server and see that the license key is intact and undamaged.
Accessing the hypervisor's log file or management logs can also speed the troubleshooting process. If logs don't clarify the issue, the root cause is likely one of the aforementioned problems.
This was first published in June 2013