eugenesergeev - Fotolia
Storage performance problems often arise due to poor configuration or device contention. While this might sound straightforward, it can be a challenge to actually locate the root cause of the storage latency problem.
Technicians can use logs and benchmarks to help narrow the problem. Logs report errors with specific devices, so start by investigating any recent issues, such as storage device timeouts or other error messages. Hypervisor-specific tools can offer specific details on performance and storage latency. For example, VMware's esxtop utility can report the average time to process storage commands. Third-party tools like IOMeter or HD_Speed can report on I/O throughput and allow performance comparisons of virtual and physical machines using the same storage resources. This can help to locate problem areas, and repeating tests can gauge the effect of any corrective actions.
Storage performance can be compromised by excess stress in the storage subsystem -- there is too much traffic contention at the storage interface, the storage controller or somewhere in the storage network. Technicians often work to isolate performance problems by making controlled changes to the storage environment. For example, try migrating afflicted VMs to an alternate storage location, such as a different disk on the local server or a different LUN on another storage array. Contention can also occur when too many VMs attempt to access the same LUN -- occasionally reported as SCSI reservation conflicts -- so a systematic migration of workloads to other storage locations can ease contention for the remaining VMs.
Storage latency can also result from iSCSI or other networked storage when configurations are not consistent. For example, if iSCSI storage uses jumbo frames, it's critical that each vSwitch or other network devices be compatible and properly configured for the same jumbo frames -- usually reported as a maximum transmission unit.
Configuration issues can also extend to outdated firmware on the physical server, as well as the local host bus adapter (HBA). Each time a hypervisor is updated to a new version, firmware may also need to be updated to meet new feature and functionality requirements. If it is not, the hypervisor update may not install, hardware may not function, or performance may be slow or erratic. Check firmware versions and update any outdated firmware.
If storage performance is poor from the start, consider the possibility of potential incompatibilities between the hypervisor and storage array or HBAs -- if performance starts off fine and then falters later on, chances are compatibility isn't an issue. Compatibility problems are much rarer today than in years past, but it's worth performing a sanity check of the storage and controllers against the hypervisor's hardware compatibility list.
Storage latency can ruin the performance of VMs and cause serious headaches for IT staff, but problems can be addressed by ensuring compatible and properly configured hardware, along with an arsenal of effective diagnostic tools. Don't overlook the value of documentation and change management tactics in storage performance -- or any -- troubleshooting. Each change in the virtualized environment can carry unforeseen consequences that might disrupt performance. Documenting and tracking each change gives IT professionals a clear rollback path and effectively identify cause-and-effect relationships that can ease trial-and-error troubleshooting.
Four common mistakes that hurt VM performance
How flash can accelerate VM performance
Matching disk types to your VM can reduce storage latency
Dig Deeper on Virtual machine performance management
Related Q&A from Stephen J. Bigelow
Learn how load balancing in the cloud differs from a traditional network traffic distribution, and explore services available from AWS, Google and ... Continue Reading
Access management is critical to securing the cloud. Understand the differences between AWS IAM roles and users to properly restrict access to AWS ... Continue Reading
Containers have rapidly come into focus as a popular option for deploying applications, but they have limitations and are fundamentally different ... Continue Reading