Virtual machine performance monitoring approaches and tools

To gauge virtual machine performance or detect failure, decide whether to monitor from inside or outside a VM and then choose a monitoring tool that works with your hypervisor.

When someone asks if there is a way to monitor their virtual machine (VM), my first question is always, "What do you want to monitor?" The second question I ask is, "Do you have any existing server monitoring tools that you can use, and if so, how do they work?" These two questions lead administrators down different VM monitoring paths.

Take, for example, the simple act of monitoring the up/down state of service or virtual machine failure monitoring. There are hundreds of tools that do this with or without the use of agents. If they use agents, the agents install directly within the VM. If they don't use agents, the tools tend to probe the VM for a state to determine if a Web server is running or if the machine can be pinged, for example.

In either case, the agents help you determine if the VM is up or down and whether its services are running. Remember, though, that you can incur license costs for probing a system and installing an agent.

Accurately monitoring VM performance

In general, anything you can monitor within a physical system can be monitored within a VM, with one exception: performance. With virtual machine performance monitoring, it is actually better to monitor from outside the VM than from within. This is because data retrieved from within the VM may not be accurate. Although this data can be used as an estimate, you don't know if the raw numbers are correct. When a virtualization host over-commits CPU, the virtualization host scheduler divides the processor between each VM, but not necessarily in a contiguous fashion. In other words, a VM no longer spends all of its time on a CPU. However, CPU-specific performance counters within a guest operating system are expected to read the CPU for data all the time. The counters then compute this data based on the fact that every nanosecond is spent running a CPU.

When CPUs are over-committed, a VM doesn't spend every nanosecond on a CPU -- some nanoseconds are spent idle, waiting for a CPU to run on. Therefore, there is often a discrepancy in the numbers that performance monitoring tools generate and from those within the VM. Since it's difficult for a VM to know when it has been made idle, this information is best retrieved from a hypervisor outside the VM. The hypervisor knows this information and can give accurate numbers for the VM in question.

Therefore, you need a VM performance monitoring tool that works with your hypervisor. There are several available for each hypervisor on the market. If you don't use one specific to your hypervisor, the data you receive will just be an estimate of your current VM performance.


Dig Deeper on Introduction to virtualization and how-tos