eugenesergeev - Fotolia
Storage performance problems often arise due to poor configuration or device contention. While this might sound straightforward, it can be a challenge to actually locate the root cause of the storage latency problem.
Technicians can use logs and benchmarks to help narrow the problem. Logs report errors with specific devices, so start by investigating any recent issues, such as storage device timeouts or other error messages. Hypervisor-specific tools can offer specific details on performance and storage latency. For example, VMware's esxtop utility can report the average time to process storage commands. Third-party tools like IOMeter or HD_Speed can report on I/O throughput and allow performance comparisons of virtual and physical machines using the same storage resources. This can help to locate problem areas, and repeating tests can gauge the effect of any corrective actions.
Storage performance can be compromised by excess stress in the storage subsystem -- there is too much traffic contention at the storage interface, the storage controller or somewhere in the storage network. Technicians often work to isolate performance problems by making controlled changes to the storage environment. For example, try migrating afflicted VMs to an alternate storage location, such as a different disk on the local server or a different LUN on another storage array. Contention can also occur when too many VMs attempt to access the same LUN -- occasionally reported as SCSI reservation conflicts -- so a systematic migration of workloads to other storage locations can ease contention for the remaining VMs.
Storage latency can also result from iSCSI or other networked storage when configurations are not consistent. For example, if iSCSI storage uses jumbo frames, it's critical that each vSwitch or other network devices be compatible and properly configured for the same jumbo frames -- usually reported as a maximum transmission unit.
Configuration issues can also extend to outdated firmware on the physical server, as well as the local host bus adapter (HBA). Each time a hypervisor is updated to a new version, firmware may also need to be updated to meet new feature and functionality requirements. If it is not, the hypervisor update may not install, hardware may not function, or performance may be slow or erratic. Check firmware versions and update any outdated firmware.
If storage performance is poor from the start, consider the possibility of potential incompatibilities between the hypervisor and storage array or HBAs -- if performance starts off fine and then falters later on, chances are compatibility isn't an issue. Compatibility problems are much rarer today than in years past, but it's worth performing a sanity check of the storage and controllers against the hypervisor's hardware compatibility list.
Storage latency can ruin the performance of VMs and cause serious headaches for IT staff, but problems can be addressed by ensuring compatible and properly configured hardware, along with an arsenal of effective diagnostic tools. Don't overlook the value of documentation and change management tactics in storage performance -- or any -- troubleshooting. Each change in the virtualized environment can carry unforeseen consequences that might disrupt performance. Documenting and tracking each change gives IT professionals a clear rollback path and effectively identify cause-and-effect relationships that can ease trial-and-error troubleshooting.
Four common mistakes that hurt VM performance
How flash can accelerate VM performance
Matching disk types to your VM can reduce storage latency
Dig Deeper on Virtual machine performance management
Related Q&A from Stephen J. Bigelow
Eliciting performance requirements from business end users necessitates a clearly defined scope and the right set of questions. Expert Mary Gorman ... Continue Reading
Requirements fall into three categories: business, user and software. See examples of each one, as well as what constitutes functional and ... Continue Reading
Navigating data center malfunctions when hardware is off premises can be tricky. Organizations must have strong SLAs with their colo provider to ... Continue Reading