The abstraction provided by a virtualization layer allows the server's resources to be overcommitted -- effectively allocating more resources and hosting more virtual machines than might otherwise be possible on a physical server. Overcommitment works because most workloads don't utilize all of their allocated resources, so the server is rarely overtaxed. However, when overcommitment spreads the server's resources too thin, shortages...
can occur and VM performance can suffer, forcing IT administrators to troubleshoot the performance problems. This tip highlights the esxtop command in VMware ESXi as a troubleshooting tool for processor and memory utilization.
Running the esxtop utility
The esxtop utility provides details about the ways that ESXi is using a server's resources. Administrators with root user privileges can launch the utility from the secure ESXi shell. What follows is the formal syntax for esxtop and its most common switches:
esxtop [h] [v] [s] [a] [c file] [d delay] [n cycles]
Each switch has a unique purpose. The following are the most common switch uses:
- h: Display the command line options for esxtop (help).
- v: Display the esxtop version number.
- s: Use esxtop in secure mode.
- a: Show all statistics.
- c file: Use a specific configuration file.
- d: set the delay between updates.
- n: Set the number of cycles (iterations) to run esxtop.
The esxtop utility starts in "interactive" mode by default and generates a report of processor, memory, disk and network statistics, which begins with something like the following:
11:23am up 21:15, 10 worlds, load average: 0.03, 0.01, 0.00, 0.00
PCPU: 3.56%, 2.23% : 2.95% used total
LCPU: 3.55%, 0.40%, 1.89%, 0.05%
MEM: 845021 managed (KB), 260446 free(KB) : 69.17% used total
SWAP: 1059554 av(KB), 0 used (KB), 1049001 free(KB): 0.00 MBr/s, 0.00 MBw/s
VCPUID WID WTYPE %USED %READY %EUSED %MEM SWPD
120 120 idle 50.22 0.00 50.22 0.00 0.00
121 121 idle 27.45 0.00 27.45 0.00 0.00
The report will typically contain more details in addition, but IT staff can assess performance loading from these first few lines of information.
Determining processor utilization
Locate the "load average" statistics on the very first line of the report. There are four entries corresponding to the average use for all physical processors on the server over five-second, one-minute, five-minute and fifteen-minute periods. These four numbers represent a normalized percentage.
For example, a 0.25 entry is 25%, a 0.50 entry is 50%, a 1.00 entry is 100%, a 2.00 entry is 200% and so on. You can see that the example report shown above represents an extremely light processor load; only 3% (0.03) utilization over the last five seconds. Alternatively, look at the total utilization of all physical CPUs in the last entry of the report's PCPU entry. The 2.95% entry reflects the short-term load average of 0.03 in the load average entry.
Simply stated, if the load average is less than 1.00, the server's CPUs are less than 100% utilized, and so processorresource shortages are probably not responsible for performance problems (though it may be possible to allocate free CPU cycles to an underperforming VM as testing continues). Utilization to 80% (0.8) is usually considered acceptable on a virtualized server, but higher utilization may forewarn performance issues.
For example, if the average load is more than 1.00, the server's CPUs are more than 100% utilized and at least some VMs may not be receiving adequate processor resources. In this situation, increase the available processors by upgrading to newer CPUs (perhaps with more cores if the server motherboard supports it) or move some workloads off of the overloaded server (workload balancing) until the server can be replaced with a more powerful model.
Technicians can also delve further into the esxtop report to learn which VMs are having trouble scheduling time on a physical CPU by examining the %READY entry for each VM. VMs with %READY entries over 5% are experiencing noticeable delays in accessing the processor, which deserves closer scrutiny. In addition, a large %USED entry means the VM is using a large portion of the processor resources to locate potential resource hogs or workloads that may require additional processor resources. For example, the VM dubbed 120 is using 50.22% of the physical processor resources, but is not experiencing any problems scheduling time on the CPU.
Determining memory utilization
Locate the MEM line in the esxtop report (usually the fourth line of the statistics) and review the amount of free memory and the percentage of used total physical memory in the server. If the percentage is extremely high or no free memory remains, then the server's memory is probably overtaxed. In this situation, add memory modules to increase the total memory available on the server, or move some workloads off of the overloaded server (workload balancing) until the server can be upgraded or replaced with a more powerful model.
Next, review the SWAP line in the esxtop report (usually the fifth line of the statistics) and check the amount of swap space used, as well as the volume of data read and written to the swap file per second (MBr/s and MBw/s). It is certainly permitted to use swap space -- perhaps even essential -- but large amounts of swap space with high volumes of read and write data per second will indicate substantial disk activity, which can impair the performance of some workloads. The example esxtop report shown above reports no substantial swap activity for the server.
When swap activity is the suspected culprit, delve further into the esxtop report to learn which VMs are experiencing excessive swap file activity by reviewing the SWPD column. Heavy swap file use may result in poor performance for the VM. Increasing the amount of memory allocated to the afflicted VM should ease reliance on swap file activity and boost VM performance. If more memory is not immediately available (because the server is already overtaxed), technicians must rebalance workloads, upgrade memory or upgrade the entire server to rectify the issue. For reference, the %MEM column reports the percentage of server memory used by the VM.
Since esxtop launches in an "interactive" mode by default, it's important to quit the utility (using the Q key) before closing the secure shell or moving on to other command-line operations. But the esxtop utility is just one example of the tools available with VMware ESX/ESXi. For example, the vmkusage tool provides a graph that tracks statistics for the physical server and each VM. The Web-based VMware Management Interface also allows administrators to track the status of virtual machines as long as the server's IP address and login credentials are available.