The details and dangers of using Hyper-V checkpoints

Reduced performance and potential disk shortages can cripple the very VMs you're hoping to protect if you don't use Hyper-V checkpoints correctly.

When I first heard about virtual machine snapshots, I intuitively assumed that an entire copy of the current state of the VM was made and put off to the side. That way, if the snapshot ever needed to be restored it could just load up that copy instead. After working with them and understanding the process, I found out I was very wrong.

To clarify naming conventions, Microsoft has renamed "snapshots" to "checkpoints" as of Windows Server 2012 R2 Hyper-V. System Center Virtual Machine Manager has always referred to them as "checkpoints" so it's good to see Microsoft moving to a single term -- however, PowerShell commands still refer to snapshots. There is no technical difference between the two.

For Hyper-V on Windows Server 2008 and 2012, Microsoft recommends not using Hyper-V checkpoints in a production environment, but only in test and development. They do support it however -- unless it's running Exchange or SQL -- which can be confusing for system administrators to decide whether to use Hyper-V checkpoints at all on production systems. Not much has changed for Windows Server 2012 onward, but Microsoft has taken this cautious approach for a few different reasons.

The first reason is the reduction in server performance whenever there is a checkpoint, due to increased I/O demands of having the checkpoint available. This can be reduced by saving checkpoints to a different disk than the VM's VHD file, but there will still be some overhead.

The second reason is that each checkpoint takes up extra disk space and is not released until the VM is shut down. This can have disastrous effects to the running of your environment, which we will explain later in this article.

Hyper-V checkpoints under the hood

So what actually happens when you click the Checkpoint option on a VM from the Hyper-V Manager?

First, Hyper-V creates a differencing disk with an .AVHD file extension. The location of this file is based on the path configured for checkpoints. It starts out as a rather small file (32 MB in my testing), but in the background, the original .VHD file for the VM is suspended. A copy of the configuration file, which has an .XML file extension, is made to cover any hardware changes to the VM itself. The current state of the memory is saved to another file with a .BIN extension, which allows the checkpoint to be restored exactly how it was at the time. The fourth file has a .VSV extension and is used for the save state of devices associated with the VM.

The VM isn't affected from being available and running -- apart from the performance hit -- but the Hyper-V host starts to juggle between the .VHD and .AVHD for reads and writes. 

When a read request occurs on the VM, it will first check the differencing disk to see if it has a record of the data for that request. If it doesn't, the host will then read the data from the original .VHD file. When a write request occurs it writes the change to the .AVHD file. The following example is very basic -- imagine millions of 1s and 0s -- but this is what's going on with the VM data:

0001110001 - Original VHD
_____0____ - AVHD Checkpoint 1

Each time data changes, the .AVHD keeps a record of that change only. When you start getting to multiple Hyper-V checkpoints, things can get a bit messier:

0001110001 - Original VHD
_____0____ - AVHD Checkpoint 1
_____11___ - AVHD Checkpoint 2
_1____0___ - AVHD Checkpoint 3

Each checkpoint is a separate .AVHD file and will track the changes from the point of time it was created until either the snapshot is deleted or a new snapshot is made. In the example above, when Checkpoint 2 is created, Checkpoint 1 becomes read only. When Checkpoint 3 is created, Checkpoint 2 becomes read only and Checkpoint 1 stays read only, just like the original VHD.

As you can see, disk usage can quickly grow out of control the more checkpoints you have, and in turn, performance will worsen. Although not much has changed from the original set of data, there's already 50% extra space required to track the three checkpoints.

Windows Server operating systems do a lot of tasks in the background and all of those small writes and changes add up surprisingly quick.

It is also worth noting that a single checkpoint file can only grow to the size of the original VHD:

0001110001 - Original VHD
1110001110 - AVHD Checkpoint 1

In this example, every bit of data is different. There is no way anymore disk space can be used without creating another snapshot. This is one of the biggest risks with Hyper-V checkpoints. If you run out of disk space, all VMs will change to a "Critical – Paused" state, which of course is a very bad thing when running in production. A "paused" VM is as good as being turned off from a user's point of view and VMs in this state cannot be resumed until sufficient disk space is available.

The amount of disk space to allocate for snapshots is hard to judge, but best practice is that they should be on a different disk to the VM’s VHD file. In a scenario where a snapshot uses all available space, it will not affect other VMs as they will still have plenty of disk space -- unless of course all your VMs have active snapshots on the same drive.

You can easily check if there are currently active checkpoints on each Hyper-V host with this PowerShell command:

Get-VM | Get-VMSnapshot

This will list all checkpoints, so you can then easily navigate to the VMs needing their checkpoints removed.

Limiting and removing checkpoints

If you decide that checkpoints are too risky, or want to have checkpoints only on particular VMs you can configure the checkpoint path to a path that doesn't exist. Just make sure staff doesn’t have access to change the setting.

You can use Hyper-V Manager to highlight the VM you want to delete checkpoints from and right click on a checkpoint from the "Checkpoint" window. From the context menu you can use the "Delete Snapshot" option. This will play back the changes to the original VHD file and consequently delete all other files created in the checkpoint creation process. Note that if you are still using Windows Server 2008, the play back process does not occur until the VM is shut down. In Windows Server 2012, this happens live.

Dig Deeper on Microsoft Hyper-V management