System management software routinely collects key performance metrics of virtualized server hardware, including the number of processors, processor utilization, memory capacity and other factors. In many cases, these metrics are collected from systems across local and remote data centers, then processed and reported so that IT administrators can understand current computing conditions and future trends. But there are circumstances when tools fail to receive some or all of the system's hardware metrics, and IT professionals must troubleshoot the root cause of any setup or compatibility problems. This tip highlights the three main areas where problems can strike: software support, hardware support and network connectivity.
Software support snags
Although many system management tools can retrieve metrics from a wide range of hardware, the process of system and hardware identification is not always automatic, especially in a heterogeneous data center. Some system management tools (such as IBM's Systems Director) must first perform a formal hardware inventory in order to "discover" the available systems and related components. This is an oversight that admins can make after installing new systems.
System management tools can vary dramatically in inventory speeds and update times, so always refer to the documentation that accompanied your management software for detailed instructions on hardware inventory procedures and precautions. It may take several minutes to complete an inventory and update the tool's database before metrics will start appearing.
In other cases, there may be a more serious compatibility issue between the systems management software, the current hypervisor and the data center hardware. Every hardware platform is designed differently and offers a different suite of features, so it is unlikely that a single software tool will acquire and report the same metrics the same way from every system. Start by checking with the management software vendor and verifying the hardware compatibility against your current software tool.
In other situations, the compatibility problem may lie in the hypervisor rather than the server hardware. For example, an older systems management tool that is not "virtualization aware" may not acquire hardware metrics from virtualized servers. In other cases, the problem may be compatibility with specific hypervisors or hypervisor versions. For example, the tool may work with VMware ESXi but not Citrix XenServer.
Regardless of whether the problem is in hardware or hypervisor compatibility, the fix is the same -- look for a patch or upgrade from the systems management vendor that can address the problem.
Hardware compatibility problems
If a systems management tool fails to provide metrics from a particular server model, chances are that the software tool is not fully compatible with your heterogeneous data center. This normally appears when the management tool is first deployed or in the wake of a hardware change within the environment.
The problem is that a single tool may not be able to provide the same level of granularity for every metric on every model of server -- this makes heterogeneous data centers some of the most challenging environments to manage. This problem typically does not appear in homogeneous data centers because the systems management tool only needs to address one (or only a few) hardware platforms.
If the problem only surfaces after general deployment of the software tool, the enterprise is unlikely to recover its investment in the management software or spend the additional capital to replace problematic servers, so the only practical workaround at that point is to seek alternative deployment models. For example, it might be possible to deploy agents on problematic systems rather than rely on a bare-metal installation and automatic hardware identification. Always check with the software vendor to discuss the problem, formulate workarounds and perhaps ask the vendor for a future patch to fix the issue.
You can avoid this problem by testing the systems management tool in advance using a lab deployment that represents a cross-section of systems in production.
Network connectivity failures
If hardware or hypervisor compatibility is not the issue, chances are that metric reporting is being disrupted by network connectivity issues or misconfiguration. This often occurs when the tool is initially deployed, after a major server refresh or in the aftermath of a crash recovery to fail over systems.
For example, network settings that govern systems management communication are often configured through the agent -- if the agent is misconfigured (e.g., reporting to a systems management server at an incorrect IP address), that server will not provide metrics for collection and reporting. Use caution when retrieving metrics from remote systems. You may need to explicitly enable the system management software for remote collection. In either case, review the documentation for the system management software and configure each system according to the manufacturer's recommendations.
Finally, be sure to review any secure connectivity at the systems management server. For example, the management server may use a Secure Shell to encrypt management communication, but this first requires the SSH service to start up on the management server. If the SSH service fails to start, the tool may fail to collect any metrics from the environment (or from the remote data center).
Comprehensive management of virtualized servers involves collecting, processing and reporting key performance metrics that provide insight into each system's health and utilization. But implementing a management tool to provide the same suite of metrics on every system can become a serious challenge. By understanding the common causes of data collection issues, IT professionals are better equipped to troubleshoot and correct the problem.