Tracking virtual machine (VM) performance to pinpoint problems, or just to have meaningful reports of resource consumption, is a very complex task. That's because virtual machines' behavior is strictly related to the underlying host, but also because performance heavily depends on what other virtual machines are doing.
Dealing with performance measurement and reporting issues is critical to the success of a virtualization adoption project. In my series on virtualization adoption, I've covered other key components in an adoption plan, including capacity planning, ROI calculation, backup, physical-to-virtual migration and more.
In the area of performance, as in other areas I've covered, I see that the market is currently offering few products really able to address needs and problems.
Virtualization needs new metrics
Traditional ways to measure performances in a data center don't successfully apply to virtual infrastructures. It's a matter of opinion, of course, whether virtualized servers are pretty identical to physical servers or completely different. Let's look at the situation.
First of all, looking inside, virtual machines offer all the traditional counters a performance monitor may need and usually tracks. So, existing reporting products are good enough if you simply install their agents in every guest operating system.
In a virtual world, however, some of obtained numbers are much less valuable, while others are simply meaningless.
A typical example is memory consumption and memory paging in a VMware ESX Server environment. VMware's flagship product has a special feature called ballooning. Thanks to ballooning, ESX can temporarily use for other purposes some memory which the system administrator assigned to a virtual machine. So, in any moment a special driver included in VMware Tools can request memory to the guest OS (operating system) , just like inflating a balloon, which is freed away and immediately reallocated to other VMs in need of it. While this happens the operating system is obliged to page out, showing unexpected, slight performances degradations. When everything is back to normal, ESX deflates the balloon and give memory back to its original machine.
In the above scenario, we have a guest OS reporting incorrect memory and page file usage, which may lead to completely wrong deductions about how a virtual machine is performing.
Going further, we could easily recognize how some other measurements make sense only related to what's happening on the host.
In a scenario where a virtual machine is frequently reporting too high CPU usage, we couldn't conclude it is time for a virtual hardware upgrade, place a second virtual CPU, and feel confident about an improvement.
Sometimes a too-high vCPU usage means the virtual machine is not served fast enough at host level, which may required a fine tuning of hypervisor's resources management or upgrading number of physical CPU. This can be discovered only by tracking specific values at host level.
So, we need to change our measuring approach, but what exactly do we need to track?
In a highly-dense virtual data center, with tents of virtual machines in a single host, we have a mandatory need to consider interdependencies and track the whole system as a single entity, rather than a sum of elements.
And since the relationship between virtual machines and hosts becomes critical, reporting solutions have to handle liquidity of every virtual data center, seamlessly adapting to hot or cold guest operating system migrations within the infrastructure.
Last, but not least, these products have to address scalability: when administrators have to consider performance of thousand of virtual machines deployed on hundreds of hosts, reporting solutions must work in fully automated mode and provide smart summaries which are still human readable and meaningful.
Populating an almost empty segment
The performance tracking and reporting solutions segment is one of the emptiest in today's virtualization industry. Partly, that's because of complexity. Also, there's still small demand. Finally, there's little awareness that traditional solutions are quickly becoming inadequate.
Obviously virtualization platforms' vendors offer enhanced reporting tools (of various quality); but, at the moment, none of them is addressing customers needs with a serious, dedicated solution.
Right now, we have to look for third-party virtualization performance tracking products, but ISVs are providing only few products that address limited segments of the market. Here are three that I've reviewed:
- There's vizioncore, which focuses exclusively on VMware with its esxCharter. This product provides many charts and a tracking history of virtual machine and host performance. It's a very good entry-level product. Vizioncore also offers a free edition which grants low-budged departments a decent capability to understand what's happening in their infrastructure.
- Devil Mountain Software (DMS) tries to embrace a much wider audience with its Clarity Suite 2006, supporting hardware virtualization solutions (VMware, Microsoft, but only Windows-based virtual machines), as well as application virtualization ones (Softricity, Altiris). Clarity Suite is a hosted solution more focused on virtualized workload profiling, comparing performances with a scoring system. The solution does some simple correlations between virtual machines and hosts metrics, useful for capacity planning and what-if scenarios, but it's still far from being the most complete reporting system for virtualized environments. Like vizioncore also DMS offers a free version of Clarity Suite, which is unfortunately very limited in the amount of deployable agents and in features.
- A new entry, Netuitive, focuses on VMware ESX Server only (as does vizioncore), but offers innovative features. For instance, the SI solution automatically profiles virtual machines and hosts performances creating behaviour profiles, which correlates and uses to recognize odd behaviours. As soon as they appear, the Netuitive SI reacts, asking the VMware infrastructure for a reconfiguration of its resource pools, so that performance bottlenecks are immediately addressed, much before any human intervention.
Looking ahead, I think that performance reporting should be the first aspect where data center automation takes hold. Let's keep our fingers crossed.
About the author: Alessandro Perilli is a recognized IT security and virtualization technology analyst who is CISSP certified and is also certified in Check Point, Cisco, Citrix, CompTIA, Microsoft, and Prosoft. In 2006 he received the Microsoft Most Valuable Professional (MVP) award for security technologies. Perilli pioneered modern virtualization evangelism, and is the founder of the well-known blog virtualization.info.