Right-sizing workloads though performance monitoring and capacity planning strategies ensures that a private cloud remains efficient and cost-effective. However, as you consolidate workloads, you will need historical performance data to assess capacity needs in your centralized service.
If you aren't already doing so, gather representative amounts of performance data so you can accurately size a cloud setup. Don't just gather CPU and memory data. You also need I/O statistics such as network throughput, disk throughput and disk latency.
While most right-sizing exercises focus on throughput, it is very important to collect latency information to troubleshoot problems. Many applications can be sensitive to latency. After migrating these apps to a centralized service, you may experience performance problems that are hard to diagnose without good historical data to compare to. These statistics can also be useful for resolving problems among storage, network, and systems administrators, particularly because facts always help defuse finger-pointing.
If you don't have historical performance data to base your sizing on and don't have time to gather any during peak loads of the year, you might opt to do some load-testing instead. In a virtual environment, admins can isolate and test a virtual machine (VM) on a single piece of hardware where it cannot interfere with other workloads.
It may also be possible for applications that use network load balancers to have part of their service moved into the cloud while the legacy portion stays out. This can give an accurate picture of the needs of a service under an actual load, and allow for a period of adjustment, as application administrators and developers build confidence in your cloud.
Documentation as a capacity planning strategy
Documentation is key for capacity planning. You need to document server configurations to go along with the performance data. Why? A five-year-old server is three generations older than the new servers you're moving to. Each new generation of server is twice as fast as the last one. It isn't uncommon to see an application that takes 80% of a five-year-old server's CPU take only 10% of a new server's CPU. While planning for growth is a good idea, it would be a serious financial mistake to plan for 80% load in this case.
You also need documentation to understand the relationships between servers and services and to understand commitments like service-level agreements. Does the service you're centralizing need a database? Do the customers expect a particular level of service? Virtualization promises that you can give an app only what it needs for performance, thereby saving money. The problem with that, of course, is that you need to know what level of performance the app or users need.
A number of commercial tools in the market can help you devise capacity planning strategies, especially for virtualized clouds. The simplest of these tools, such as VKernel’s CapacityView, can be deceptive because they take the average size of your VMs and then divide the remaining capacity by that to determine how many more VMs you can have. Such calculations can result in bad assumptions about your environment.
The tools may assume, for example, that all unreserved CPU and memory are idle. That's a very bad assumption if memory or CPU reservations do not exist in your environment. Likewise, they sometimes don't account for N+1 cluster design when you don't want to count your "+1" as usable capacity, since it is failover.
Make sure you test a capacity planning tool thoroughly and are confident in the results before releasing the numbers to decision makers and budget planners. It's worthwhile to proceed slowly to reduce pressure on capacity planning. Many organizations consolidate services as the hardware warranties on their physical hosts expire. This often leads to a gradual move to cloud and cloud-like operations, and allows time to compensate for errors in capacity planning and general migration issues.
It may be a point of pride to say you have a standardized, centralized, automated private cloud, but there's less to brag about if things don't work, so take the time when right-sizing your workloads.