shefkate - Fotolia
In years past, the goal of troubleshooting was to find what's "broken" and then implement the proper fix. Perhaps this meant exchanging a failed network cable, replacing a crashed disk drive or identifying a bad dual in-line memory module in a server. Today, however, the idea of what is "broken" has shifted. As businesses come to regard IT as a service provider and broker, the goal isn't as much about fixing things as much as it's about figuring out whether applications and services are delivering the availability and performance that the business needs.
We don't worry about break-and-fix hardware devices that much anymore -- load-balanced clusters can usually keep applications available, and snapshots can restore a recent application state when things really go sideways. We actually want to know whether an application's latency is acceptable, if the number of transactions per second is within tolerable limits and so on. And we rely on an array of powerful tools, like application performance monitoring (APM) software, to give us the metrics we need.
Containers fit into this realm of performance troubleshooting, but the time horizon has shrunk to almost negligible levels. Unlike physical machines that might run for years or VMs that go for months, orchestration and automation can spin up a container, run it and then release it again in just a few seconds -- maybe less.
This impacts several aspects of container performance troubleshooting. Beyond APM -- is application X running right? -- IT administrators will need to know how to follow container resource usage patterns over time: how many are running at what times of day, where the containers are being deployed and how that activity is translating to CPU, memory, storage and network traffic. IT professionals will need tools that can translate APM metrics into more granular resource tracking and reporting so people can tell when it's time to upgrade or repair hardware. It's this potential fluidity in resource demand that makes containers and demand-based scalability so attractive for public cloud deployments.
Next, IT professionals will probably not be able to see the impact of containers on an application workload. A container that spins up and releases after three seconds will probably leave no perceivable effect on the application, but there can certainly be errors and alerts to sort out through some kind of management dashboard. Container performance troubleshooting will rely heavily on logs to record container activities and log analytics to correlate those activities to system logs, APM results and other log sources.
Third, container performance troubleshooting will need to track and report the intricate interdependencies that can arise between containers -- especially in complex container architectures, such as microservices-based applications. IT professionals will need to see how a change in one container cluster affects upstream and downstream container clusters to offer cause-and-effect insights into application behavior. For example, watching the number of API calls between containers can help gauge utilization traffic, while watching the number of failed API calls can drive container scaling to help ensure continued performance.
Ultimately, the real challenge of container performance troubleshooting will be keeping pace with the speed and scalability that are commonplace in ephemeral data center environments.
Effectively manage and monitor containers
Dig Deeper on Application virtualization
Related Q&A from Stephen J. Bigelow
Navigating data center malfunctions when hardware is off premises can be tricky. Organizations must have strong SLAs with their colo provider to ... Continue Reading
Regression tests and UAT ensure software quality and both require a sizeable investment. Learn when and how to perform each one, and some tips to get... Continue Reading
Learn the meaning of functional vs. nonfunctional requirements in software engineering, with helpful examples. Then, see how to write both and build ... Continue Reading