This content is part of the Essential Guide: How to choose the best hardware for virtualization

Storage I/O control and other strategies to prevent VM storage problems

Ensure adequate VM storage resources by reducing VM count per LUN, using queue depth throttling, limiting disk requests, enabling SIOC features and monitoring storage latency.

VMs exist in real time, relying primarily on CPU and memory resources. But storage is still an important resource for holding inactive VM images -- which can later be started in CPU and memory -- and holding snapshots that capture the point-in-time state of VMs, as well as storage for the application launched within the VM. Beyond the straightforward consideration of storage capacity, some applications, such as transactional databases, can place extreme demands on storage capacity and performance. As a consequence, it's still important for administrators to consider various storage attributes and technologies, such as storage I/O control, to provide adequate VM resources.

Reduce VM per LUN

It's certainly possible to place multiple VMs on the same logical unit number (LUN), but this can easily lead to storage contention as VMs -- and their resident workloads -- vie for attention from the physical storage device, such as a storage array. For example, if two VMs are on the same LUN and both VMs attempt to access the LUN simultaneously, one or both of the competing VMs might see a reduction in storage performance. More VMs will only exacerbate the potential problem.

Place fewer VMs on the same LUN wherever possible in order to improve the total number of I/O requests to the storage subsystem. For extremely latency-sensitive workloads, it might be best to use one VM per LUN. However, too many LUNs might result in excess I/O traffic, overtaxing the storage subsystem and returning queue full or busy errors -- effectively rejecting some I/O traffic -- which will also reduce storage performance.

Use queue depth throttling if possible

Place fewer VMs on the same LUN wherever possible in order to improve the total number of I/O requests to the storage subsystem.

A hypervisor like VMware ESXi supports the control of storage queue depth (QD). Basically, QD is the number of commands that a storage host bus adapter can send or receive in a single request on a per-LUN basis. If this queue depth is exceeded, the storage device or subsystem generates I/O failure messages -- the queue is full -- and causes the system to retry the I/O requests again, leading to performance degradation for the VMs that rely on storage. When administrators encounter frequent queue full or busy errors, enabling the hypervisor's depth queue throttling functionality might help improve storage performance by reducing the number of errors returned -- forcing fewer retries.

Limit disk requests if necessary

VM disk requests aren't limited by default. When two or more VMs share the same LUN, one or more VMs might monopolize the storage I/O and reduce storage performance for other VMs on the same LUN. Look for hypervisor features, like request throttling, which can adjust the maximum number of current disk requests per volume. When multiple VMs share the LUN, such throttling can limit the total number of commands from all of the VMs on that LUN. If only one VM is on the LUN, this feature has no benefit.

Allocate storage I/O resources dynamically

Hypervisors complicate storage I/O with blender effect

A hypervisor can be extremely adept at dynamically allocating storage I/O bandwidth resources in order to maintain workload performance during periods of I/O contention. For example, VMware ESXi can allocate I/O to the VMs through a system of disk shares -- the more available shares provided to a VM, the more available storage I/O that VM can utilize. In addition to disk shares, administrators can apply IOPS limits to prevent uncontrolled I/O utilization by VMs. More recently, storage I/O control (SIOC) features can check disk shares for all of the VMs accessing a LUN and allocate the storage I/O resources accordingly, providing more holistic evaluation and control over storage bandwidth use.

Monitor storage latency

Finally, it's good practice to monitor storage I/O latency using objective tools that can report factors like device latency, time spent in the kernel and latency experienced by the guest OS. For example, ESXi provides tools like Esxtop and Resxtop that can report storage device latency in the Guest OS Average Latency per Command (GAVG/cmd) metric. The average latency will depend on the actual storage subsystem in place -- some storage systems are faster or slower than others. Administrators that employ storage I/O control features can compare SIOC measurements against GAVG/cmd, which should be notably lower than SIOC measurements.

Next Steps

Learn about storage technology advancements

Avoid VM load-balancing mistakes

Navigate the VM-aware storage arena

Dig Deeper on Virtual machine performance management