When it comes to bottlenecks, the big four that always come to mind are CPU, memory, network and storage. Since...
these are typically the infrastructure pieces that make up a virtual environment, it stands to reason that they are the main culprits. As we dive deeper into each of these, it's important to remember that they are all related. Adjusting one can have both positive and negative effects on the others. Keeping things in balance is the first step to ensuring the overall health of your application, and understanding what type of application you have is a critical component of that. Is your application transactional-based or more memory-intensive? These application profiles come into play as you begin to identify bottlenecks.
Very high-speed transactional workloads require many processors to execute quickly. The inverse is a long process that needs very high CPU speed to execute quickly. While these problems are both CPU-related, how you address them differs based on your environment. You most likely have two choices for your CPU: a lot of cores, but limited clock rate, or a lower number of cores, but high clock rate. Neither one is better or worse than the other from a base perspective; it truly depends on what your application needs and if it's multi-threaded.
If you weren't able to match up your CPUs with the application workload beforehand, below are a few tips to resolve bottlenecks:
First, you can always add share weight to key VMs and create resource pools for critical VMs that require additional resources. Conversely, you can pin VMs to CPUs to reserve all of the resources. However, remember that if you pin a VM to a physical CPU, you remove the ability to live migrate. Next, mix production workloads with test/dev workloads on the same cluster or physical servers. This strategy allows you to limit your test/dev workloads if your production applications require more CPU cycles. If your cluster is all production, then you have nothing from which to borrow resources. Finally, look for misconfigured VMs that have been allocated resources they don't need. This can come in the form of CPU reservations, CPU overallocation or cores. Even if the VM isn't using them, resource over-allocation can still rob your infrastructure of performance by causing resource constraints.
Memory is one of the most common places bottlenecks occur, and, unfortunately, installing additional memory is extremely costly. It's easy to identify bottlenecks in memory because one of the first things the OS does to address them is swap out to disk. An easy giveaway is when you see disk activity jump but performance nose dive. In the event of a memory bottleneck, take the following steps:
First, ensure your guest OS tools are installed and up to date. While it may seem like a simple thing, these tools ensure your VMs are memory efficient. Then, address overallocation of memory to your VMs. By using VM-aware monitoring tools, you will be able to see inside of your virtual guests and find the machines that are using the memory allocated to them and the ones that are simply caching memory. It's important to note that even with the guest tools installed, Windows VMs can cache a lot of memory resources for that just in case situation. Using in use memory monitoring will help you identify them. Also, consider using limits. Reservations can be used as well, but I would shy away from them due to the issues they can cause with allocated but unused resources. Limits are always a good idea, especially when it comes to test and dev servers in a mixed environment.
Often, non-network admins will blame the network for slow performance or bottlenecks. More often than not, that's simply not the case. It's not that it can't happen, but the networks today are simply too fast to be a bottleneck. Traffic congestion on a modern 1 Gb or 10 Gb network is just not going to happen.
That doesn't mean the network is completely off the hook. It is, after all, the gateway to everything else in your infrastructure. The network may not be guilty of performance issues, but misconfiguration can cause unintentional bottlenecks. While this is more of a configuration or change management issue and is something that does need to be accounted and reviewed in the event of an application bottleneck, it shouldn't be the first thing people blame anymore.
In the next article in this series, we'll help you identify bottlenecks in one of the biggest performance killers: storage.
In the first article of the series we discuss the difference between bottlenecks and faults and how to understand performance data and application limits.
Boost the performance of your nested VMs
Choose the right network performance tool for you
Achieve improved speed and performance with NVDIMMs