You can spend weeks -- even months -- designing the ideal infrastructure to ensure you'll have the proper resources and growth potential, but no matter how well you prepare, eventually you're going to experience application performance issues or bottlenecks.
Bottlenecks vs. faults
Bottlenecks are not the same thing as a fault or an error. In IT, faults can be quickly identified and corrected. A bottleneck, which is an occurrence that restricts something or causes application performance issues, can't be corrected by traditional means. Its effect can range anywhere from a slight annoyance to severely reduced system performance. What separates a bottleneck from a fault is that with a bottleneck the application in question still works to a degree. Therein lies the problem. It's easier to find and fix something that is broken than it is to identify application performance issues and troubleshoot. When something is broken, you can take steps to bring it back online, but when something is partially working, you could potentially make the situation worse by bringing it down completely.
It's important to understand that a bottleneck isn't always a mistake. A bottleneck is a part of the system that can't keep up with the rest of the system. Now, in an ideal world, all of the pieces of an application infrastructure would operate at the same performance level and the system would be perfectly in tune. Unfortunately, that isn't the case. Some parts of the application infrastructure operate faster than others. In and of itself, this is not a fault or a design issue; it's simply the way the application interacts with the infrastructure. To fix the bottleneck, you have to identify the constraint.
Monitoring your environment
Every system will experience a bottleneck, but that's only a bad thing if the performance of the application is adversely affected. In general, bottlenecks occur because some part of the system is slower than the rest, but you can only tell what's slowest if you have true performance data. The first step to troubleshooting a bottleneck is identifying a true bottleneck. That's why monitoring is so critical, and not just from the infrastructure side; it needs to happen at the application or user end as well. Timing and impact are also essential: Is the bottleneck appearing too soon? What exactly happens when the bottleneck occurs? Does the bottleneck cause a disastrous failure or is its impact incremental? It's impossible to remove all bottlenecks because, simply put, applications and infrastructure can't scale indefinitely. However, the goal here is to prevent bottlenecks that occur too soon or have a disastrous impact on your applications.
A critical element to all of this is the application itself. Purchasing a product designed to look for bottlenecks in your environment has little effect if you don't know enough about your application. Otherwise, the tool is simply giving you numbers and values that have no real meaning to you or your application. The data should show you the bottleneck's effect on the application's performance. Remember, the end user doesn't see or truly care about the back-end infrastructure; they care about the application they see and use every day, and that must correlate to what you are monitoring.
Locating the bottleneck
For the administrator, this now becomes a layered approach in how they need to look at it. Is it a single person or multiple users? If it's a single person, you could say it was related to that end client or the user, but when it happens to multiple users, that points to something on the back end. Most client-server applications tend to involve a front-end web piece combined with something on the back end, such as a database server. Your first step is to check the basic performance metrics, such as CPU, memory, networking and storage. You must rule out simple things like a hung process or low disk before moving on to more involved possibilities.
Once the basics are ruled out, you can look at what part of the application process is lagging. Is it queries or pulling up the interface? This will give you an indication of where to start looking. When you've identified a starting point, you can then examine logs, performance stats and history to see if anything is a sign of application performance issues. Keep in mind, you aren't looking for an error, but a pattern of events that show a drop in performance. This can be disk or networking timeouts or longer than normal queue lengths for I/O. This may involve you pulling additional stats from your networking or storage devices if your servers are only showing the result of the issues.
Another piece to this is realizing when something is no longer a bottleneck and is a design or application limit. As mentioned, no application can scale indefinitely and limits exist naturally. There aren't printed guidelines for these limits, as each situation and installation is unique. Knowing when you are hitting an artificial limit due to a bottleneck versus a hard limit due to the application or your infrastructure is part experience and part understanding your environment and application.
In the next installment of this series on application performance issues, we'll go into detail about CPU, memory and network bottlenecks, and in the final installment, we'll help you identify bottlenecks in one of the biggest performance killers: storage.
Learn about the long-term effects of poor application performance
Boost application performance with flash storage caching
Choose a product that will monitor application performance