Clustering problems with Hyper-V VM configuration files, VM states

Clustering problems with Hyper-V VM configuration files, VM states

This four-part series focuses on clustering problems with Microsoft Hyper-V virtual machines (VMs). Part one covered how firmware, drivers, patches and updates affect virtual host cluster stability. Part two offers personal workarounds to two Hyper-V clustering problems that have helped the overall stability of my virtual environment.

Clustering problem No. 1: Unsynchronized VM states
Recently, I had problems with my HP Virtual Connect firmware. I experienced prolonged public and private network interface card outages that caused nodes to sense other host failures in the cluster. As a result, VMs attempted to restart on alternate nodes.

In some instances, Failover Cluster Manager would display VMs in a "saving" or "starting" state. Hyper-V Manager would show these VMs as saved or running, but this information would not register in Failover Cluster Manager.

The following three workarounds sync the correct VM state with the cluster:
  • In Hyper-V manager, resume/start the VM in a saved state. Then, manually save the VM in Hyper-V Manager. Most of the time, this triggers the cluster to show the true VM state.
  • The second workaround is similar. Instead of stating the VM manually in a saved state, shut it down in Hyper-V Manager. At times, this releases the VM's hung state within Failover Cluster Manager.
  • The third option is more involved. A Microsoft TechNet forum suggests using Sysinternals Process Monitor to locate the VMWP.exe process associated

    Requires Free Membership to View

    When you register, my team of editors will also send you the latest expert resources covering all areas of server virtualization, such as platforms, architectures and strategies, server hardware, managing virtual environments, application issues and more.

    Margie Semilof, Editorial Director

    By submitting your registration information to SearchServerVirtualization.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchServerVirtualization.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

  • with the troublesome VM. By killing this process, the VM will crash and restart on another cluster node --syncing the VM state in Failover Cluster Manager. It's not the best option, but sometimes a hammer is necessary. It also beats having to kill other cluster services that affect every VM on a node.
(Note: I use Hyper-V Manager because Failover Cluster Manager and System Center Virtual Machine Manager are not functional with VMs in this problem state. Hyper-V Manager is responsive and accurately displays the VM's true state.)

Clustering problem No. 2: Orphaned VM configuration files
After an unexpected VM failover, a few manual cleanup routines are necessary to return Hyper-V virtual cluster environments to their top efficiency levels.

One process involves deleting the configuration files that are found at C:\ProgramData\Microsoft\Windows\Hyper-V\Virtual Machines. These link files point Hyper-V to the location of the VM extensible markup language (XML) configuration files.

During a planned or controlled failover, the link files are deleted after the VMs shift to another node. When unexpected failure occurs, however, the failed node's VM link files are orphaned. The orphaned configuration files have little effect on a system, but I've seen instances when a quick migration of a failed VM back to a previous node causes a failure. The biggest nuisance of orphaned VM configuration files, though, is the continual appearance of 4096 errors in the event log, as seen in Figure 1.

Figure 1
(Click image for an enlarged view.)

These event log errors point directly to the files that need to be deleted. In this example, notice the hardware configuration global unique identifier (GUID). At this location, there will be a link file with the same GUID as the one in the error message. Delete this orphaned VM link file, and the event log error will be resolved.

Figure 2
(Click image for an enlarged view.)

Be careful, though. If an active link file is deleted, that VM will fail/crash, and you will have to add the VM back to the cluster.

After the orphaned VM link files are deleted, the error messages will stop and the VM failover process will be more stable.

Stay tuned, because in part three of this series, I present more personal fixes for Hyper-V virtual machine cluster problems. Until then, send me any feedback or issues you have seen.

About the expert
Rob McShinsky is a senior systems engineer at Dartmouth Hitchcock Medical Center in Lebanon, N.H., and has more than 12 years of experience in the industry -- including a focus on server virtualization since 2004. He has been closely involved with Microsoft as an early adopter of Hyper-V and System Center Virtual Machine Manager 2008, as well as a customer reference. In addition, he blogs at VirtuallyAware.com, writing tips and documenting experiences with various virtualization products.


This was first published in December 2009

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.

    Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.