Storage is typically one of the biggest performance bottlenecks in a virtual infrastructure, simply because of...
the latency inherent to spinning hard disks. While storage area networks and RAID technology have reduced that element, they haven't been able to eradicate it completely. This doesn't mean everything wrong with your application performance is storage-related, but it's usually a good place to look when you're trying to find bottlenecks.
The key indicator of a storage-related bottleneck is the latency of your application or OS, as opposed to IOPS. Often, we get caught up in the raw IOPS of what an application states it needs, or what the storage array can deliver. This focus on the printed specifications, such as 5,000 IOPS for a database, can contribute to bottlenecks in the overall storage response time. When we focus on IOPS and not latency, we tend to forget about the entire storage path and all of the pieces in it that can cause delays.
Causes of application latency
Several factors can affect latency, including drive configuration, server connection and even the backplane of the storage frame itself. Drive IOPS is only a small piece of the overall puzzle. IOPS is important as you design your infrastructure, but less so when you're looking at actual performance. While the disk and the I/O it delivers are significant, don't consider that the only place you'll find bottlenecks.
Most storage backplanes have several channels to which the data is sent from the drives. This can be a sticking point if your storage frame loads aren't balanced across those channels. This balancing process is normally done manually by the location of the drives in the frame.
Another key point to remember is the connection from the storage frame to your server. This can consist of a fiber or iSCSI network with front-end ports. These ports should be on a dedicated network, especially in the case of iSCSI, and you must take care not to overload the ports of the storage frame, which can create a bottleneck.
One of the biggest things you can do to combat storage issues is to replace mechanical hard disk drives with solid-state drives. These remove the mechanical limitations of the older drives and bring them into the same category of speed as CPU or memory. Though this sounds ideal, it brings up some cost concerns, as well as a whole new issue when it comes to bottlenecks.
With storage being supplanted as the biggest culprit in bottlenecks, you may now highlight other issues in the application stack. Items that weren't issues before, such as CPU or memory, may now jump to the forefront as bottlenecks. This infrastructure pain point shake-up can send engineers looking for resolutions to problems they have never faced before.
Hyper-converged infrastructures and blades
Hyper-converged infrastructures put a unique twist on bottlenecks. Though a hyper-converged infrastructure combines everything within one device, that doesn't mean it removes all bottlenecks. Rather, it can make finding bottlenecks harder, since you may not have all the traditional diagnostic tools that come with the individual pieces of the infrastructure. This can make seeing "what's under the hood" a challenge as you work to find bottlenecks in hyper-converged infrastructures.
Many of these same challenges exist with blades, as well. Bottlenecks affecting one piece can affect the entire blade, including unrelated applications. If the tools provided can't see all the connections, you might not even be able to find bottlenecks, let alone try to fix them.
Often, the most overlooked challenge to identifying bottlenecks is the application itself. Too often, we start the investigation thinking the hardware or the infrastructure is the issue because we have been conditioned to do so. It's important to realize that applications have limits just like everything else in IT, and they should not be overlooked. Bottlenecks can exist anywhere now, and to find them, we must look at the entire picture.
In the first two instsallments of this series we discuss the difference between bottlenecks and faults and how to understand performance data and application limits, and offered tips on how to eliminate bottlenecks in CPU, memory and networks.
Boost flash storage performance with an all-flash array
Linux I/O Scheduler options for storage performance
Monitor ESXi storage performance with esxtop