How should we approach a server consolidation project, and how much spare capacity should we leave on each server?
Consolidation projects are important steps toward optimizing a data center, but they involve a great deal of consideration and planning, rather than simply placing more workloads onto fewer servers as resources allow. IT planners must understand the relationships and dependencies between workloads, provide adequate network bandwidth and storage IOPS to handle the demands of multiple VMs, and decide on a coherent strategy for workload failover and restoration. Any server consolidation plan should be implemented in cautious phases and tested thoroughly before being rolled out to the entire data center.
One issue many companies struggle with when developing a server consolidation plan is whether to leave spare capacity on each server or use spare servers. The first approach calls for leaving spare resources on some (or even all) consolidated servers. This kind of "headroom" allows the workloads on one troubled server to be migrated or restarted on other servers with room available until the troubled server is corrected. The disadvantage is that it leaves some computing resources unused. In other cases, servers may be fully consolidated (leaving no "headroom" available), but one or more standby servers are kept available to receive failover workloads if the need arises.
Ultimately, both approaches are perfectly acceptable -- the choice depends on how you plan to respond to failover situations.
Recent virtualization adopters typically implement low levels of consolidation that can significantly reduce the non-virtualized server count, and still leave ample computing resources available on many remaining systems. In this situation, the administrator can distribute workloads from a troubled server to other servers without the need for bringing extra systems online.
The landscape changes a bit for experienced virtualization adopters that have systematically maximized the consolidation on most servers. In this case, it may be impossible to find failover capacity on highly consolidated servers, so a small contingent of extra systems may be kept on standby to accept workloads when problems arise with a production server.
More on creating a server consolidation plan
Server consolidation planning: Easy on the memory overcommit
P2V migration: Step one in your server consolidation strategy
Server consolidation strategy pitfalls: Over-consolidation
One important consideration for any server consolidation plan is to place complementary workloads together wherever possible. For example, a database and customer relationship system may coexist perfectly on the same server where both workloads can exchange data without needing to send queries and data across the network. This can help to improve the performance of both workloads by reducing their dependence on the local network.
Consolidation projects are rarely one-off, and many organizations undertake several rounds of consolidation as virtualization expertise grows and computing platforms evolve.
Virtualization and consolidation are typically not "all-or-nothing" practices. For example, virtualization is often introduced as a pilot program, and expands across the data center as staff members master the technology and its value becomes understood. Consolidation is usually approached in the same spirit. Early efforts often start by moving some workloads around and placing just a few workloads on each physical system. This yields major hardware savings yet leaves plenty of resources available. As confidence improves and value is demonstrated, organizations implement additional waves of consolidation to place more workloads onto fewer systems.
As consolidation increases to optimize system utilization, the savings are less pronounced, but the more experienced IT staff is far better equipped to handle optimization without serious errors or oversights. So the lesson here -- as with other major IT deployments -- is to approach any server consolidation plan in phases. Start small with non-critical workloads, and then systematically embrace more important workloads over time.
This was first published in November 2013