BACKGROUND IMAGE: stock.adobe.com
SAN FRANCISCO -- There are several ways to troubleshoot and prevent performance problems in a virtual environment, including understanding the user experience, looking at the application stack and rightsizing your hosts and VMs from the get-go, but the approach you should take depends on the specific issue you're dealing with.
In a session at VMworld 2019 called "VM Performance Demystified and Best Practices to Optimize," speakers Akshay Kalia and Vikram Nag, both staff technical support engineers at VMware, shared their strategies for investigating the scope and cause of performance issues, and offered recommendations for avoiding future problems.
Track down the performance killer
First, decide whether to approach the problem from the top down or from the bottom up. The top-down approach starts with the application stack, and then the OS stack, VM stack, ESXi stack and finally the infrastructure.
The bottom-up approach starts with the infrastructure, and then the ESXi stack, VM stack, OS stack and finally the application stack. In most instances, it makes sense to start from the top down, because it's a smaller scope, but this isn't always the case.
"vSphere admins don't always have insight and access to the VM's OS, so it's good to cut it in half and first try to decide if [the problem] is in the top half -- the VM -- or in the bottom half -- the hypervisor," said Rob Bastiaansen, an independent trainer and consultant based in the Netherlands.
Next, scope out the issue by asking the right people the right questions, and take advantage of useful tools, such as the ESXi command line and vRealize Operations (vROps). You should also generate a metric tree with CPU, memory, storage and network stats.
Get insight on the user's experience
Getting a clear picture of what the user sees on his or her end goes hand in hand with the approach you take in investigating the stack.
"Finding the killer app has to start with the customer experience rather than the technology," said Brian Kirsch, an IT architect and instructor at Milwaukee Area Technical College. "While metrics help and provide data, with the disturbed application design … you can't simply depend on the metrics of a slice of the environment."
Ask the basic questions -- including who, what, where, when and how -- and view the user's data to cover your bases. Also, collect the data for all layers at the same time to get the most accurate picture.
"It's a lot easier when it's happening … versus having to go back in time to try to figure out what was happening, because then the pieces are harder [to gather]," said a systems engineer at the session.
It can also be helpful to reach out to the application team, guest OS team, infrastructure team and/or the rest of the virtualization team to collect data.
Take advantage of troubleshooting tools
Several tools can help troubleshoot the issue, but which one(s) you should use depends on a number of factors, including the infrastructure platforms you use, the types of applications you run and the budget you have for third-party management products.
A good place to start is the ESXi command line. Some useful commands include netstats, vsish, vscsi stats and esxtop. Top is a task manager program included in many Linux distributions.
A couple other options include iPerf, which is a network testing software tool that simulates application I/O to the network, and Iometer, a storage testing software that simulates application I/O to the main storage.
You can also take advantage of the Windows Performance Monitor utility, if it makes sense to do so.
VMware vROps provides intelligent operations management with application-to-storage visibility across physical, virtual and cloud infrastructures.
"Feedback from the app owners would be the biggest thing [for optimizing VM performance], which isn't very productive compared to what you can get with vRealize … but we're adapting," said a system administrator at the show. "We're monitoring and looking at the rightsizing picture."
Manage monster VMs effectively
Monster VMs are a common cause of performance problems in virtual environments. Monster VMs are useful for applications that need high CPU and memory resources, and they have low operational and capital expenditure. But they also have higher virtualization overheads and are more prone to resource scheduling issues.
"One of greatest pitfalls with large VMs is that they can stretch to the borders of what your hardware supports, or can even go beyond that," Bastiaansen said. "Therefore, as an administrator, you need to know what's under the hood. How many CPUs do you have and how many cores per socket?"
When deploying monster VMs, refer to vROps, CPU and memory demand metrics to rightsize the VM. To rightsize the host, make sure you'll have free cores when the monster VM is using all the resources assigned to it.
"Properly document what size VMs admins can choose from when creating new ones," Bastiaansen said. "Or create templates with those sizes as a sort of menu that they can choose from."
Instead of using slot-based High Availability admission control when running monster VMs with reservations, configure specific resources. Have a conversation with the application teams to determine needs and set expectations. Also, base you monster VM configuration as if hyper-threading doesn't exist.
"With hyper-threading, the number of logical CPUs doubles, but not your real compute power," Bastiaansen said.
And finally, when it comes to preventing future VM performance issues, remember the basics:
"Follow the industry standards and the guidelines they give you, and you can't go wrong," said Joe Dutro, system administrator at Casper College, a community college in Casper, Wyoming.