Esxtop: Troubleshooting ESX servers with performance metrics

Colorful pie charts depicting ESX server performance look great, but sometimes you need more powerful metrics. The esxtop command retrieves key performance metrics.

Many Windows admins love the PerfMon command. Its graphical approach to performance monitoring simplifies Hyper-V administration.

Pulling performance metrics off a VMware ESX server can be more complicated, and unfortunately there isn't an ESX equivalent to PerfMon. Charts and graphs from the vSphere Client are useful, but sometimes you need more detailed information. In these situations, try the esxtop command to compile sophisticated CPU and memory statistics.

More on esxtop and other monitoring commands
ESX commands and ESXi commands: Top 25

Using esxtop to identify VMware ESX memory use

VSphere's built-in performance monitoring tools

 The esxtop command is related to the Unix top command, but it was reconfigured to provide more virtualization-specific information. Entering esxtop from the command line results produces a list of VMware-specific processes and statistics for individual virtual machines (VMs).

Esxtop can measure virtual environment performance in four ways:

  • CPU mode (press the C key);
  • memory mode (hit the M key);
  • storage mode (press the D key); and
  • network mode (the N key).

The information retrieved by esxtop
To demonstrate the usefulness of esxtop, log in to an ESX server, and enter CPU mode to display the CPU counters. The information at the top of the screen describes the server's CPU performance. At the bottom of the screen, the columns show the CPU statistics for each process and running VM. Some of these numbers require a bit of extra explanation.

At the very top of the screen, you will find the following information:

  • current time;
  • time since the last reboot;
  • uptime; and
  • the "worlds" --that is, scheduled entities or processes, -- that are running.

The top line also displays the CPU averages over the prior one, five and 15 minutes. These averages are depicted as a decimal number and represent the CPU utilization of running and ready-to-run worlds. As the decimal number increases, more CPUs resources are consumed -- with a measure of 1.00 meaning that all physical CPUs are fully utilized. This number can be more than 1.00 if the available CPUs cannot process all of the incoming requests. A measure of 2.00, for example, means that twice the number of CPUs is needed to process the waiting CPU requests.

Immediately below the first line are two additional rows:

  • PCPU(%). This metric identifies the percentage of CPU utilization per physical CPU, followed by the total average across all physical CPUs. This data will display as a series numbers, one for each physical CPU, followed by a Used Total, which is the average of each of the other numbers. This information identifies the spread of CPU requests for the available physical CPUs. Monitor this statistic if VM CPU requests start to target a particular physical CPU more than the others.
  • CCPU(%). This line displays the CPU time as reported by the ESX server. This number differs from PCPU(%), because it's measured as a user time variable,not from the CPU's perspective.

More esxtop metrics
Each process and VM has a separate line in the lower half of the screen. Each line represents the CPU utilization for the item on that row, with 13 CPU metrics gathered on each line. Below is a chart of the metrics and their descriptions:

Metric

Description

ID, GID, NAME and NWLD

These four counters display the ID of the row item, resource pool ID, name, and number of items in its resource pool. These counters are used mostly for uniquely identifying an activity -- process, VM or otherwise -- that's consuming CPU resources. For each activity, several types of CPU resources are identified on the remaining lines.

%USED

The percentage of CPU resources that are used by the item.

%RUN

The percentage of time that is scheduled for the item.

%SYS

The percentage of time that the kernel code is running.

%WAIT

The percentage of time that the item is in a wait state.

%RDY

The percentage of time that the item is ready to run but is also waiting for the CPU's attention. Note that this is very different than %WAIT, in which the item is in a wait state.

%IDLE

The percentage of time the item was idle.

%OVRLP

The percentage of time the system services worked on behalf of the item.

%CSTP

The percentage of time when all its virtual CPUs have stopped and are waiting to restart. This value is meaningful for multiprocessor VMs.

%MLMTD

The percentage of time the item was ready but specifically prevented from running to avoid CPU-limit settings.

The esxtop command generates such a large amount of raw data, which can overwhelm administrators. nce you train your eye, the exatop command becomes a powerful tool for troubleshooting ESX server performance.

Greg Shields
 

Greg Shields is an independent author, instructor, Microsoft MVP and IT consultant based in Denver. He is a co-founder of Concentrated Technology LLC and has nearly 15 years of experience in IT architecture and enterprise administration. Shields specializes in Microsoft administration, systems management and monitoring, and virtualization. He is the author of several books, including Windows Server 2008: What's New/What's Changed, available from Sapien Press.


This was first published in August 2010
This Content Component encountered an error

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchVMware

SearchWindowsServer

SearchCloudComputing

SearchVirtualDesktop

SearchDataCenter

Close