Sizing a server for virtualized clusters requires a choice between using a few large, powerful servers or many...
small, inexpensive ones. Administrators must compare the advantages of each setup in the context of their particular use cases.
The debate between large and small servers also raises the question of whether to scale up using cluster power from the efficient use of large boxes or to scale out using many popcorn servers, as is done in the cloud.
There is no one-size-fits-all answer because there
Sizing a server depends on
Popcorn servers perform better with apps designed for scale-out infrastructure. Popcorn servers grow by adding more copies of the app stack. This is the essence of cloud computing: growth by cloning.
But any cloud service provider's instance portfolio will show that even the cloud needs little instances and big ones. Popcorn servers won't work if the job uses more than the available memory, for example.
Big servers might seem to be the obvious strategy because they enable fragmentation down to the same small instances that popcorn servers support. The key here is use cases. Serving
A big server can accomplish the same thing, but it requires multiple expensive solid-state drives (SSDs) to handle huge instances, as well as much more DRAM per core. The cost per instance, in this case, favors small servers, though the coming migration from hard disk drives to all SSD servers might affect that decision.
At the other end of the spectrum, in-memory databases run best on large servers. Having fewer servers reduces the latency and bandwidth required for communication between the boxes, and it also limits the number of excursions for a piece of data to another box. Using fewer servers makes using faster links -- 100 Gigabit Ethernet (GbE) versus 10 GbE -- more affordable, which further improves cluster performance.
Compare costs and performance
Size and performance don't correlate linearly. Popcorn servers are inexpensive, with 10 GbE on the motherboard. The units are cheap to make and sell in huge quantities, and there are many competitive vendors, which further drives down prices.
A huge server is close to custom-made. Packing in enough memory requires the latest high-density, dual in-line memory modules, and the SSDs are likely nonvolatile memory express top-line units. All of these together push the price higher.
There are also strains on the rest of the infrastructure. Power systems for popcorn units have single-phase delivery to the servers, while large units use two or even three-phase power, which can be expensive. Popcorn CPUs are around $100 each, whereas a top-line, 22-core CPU can cost $2,000 or more.
The rise of data analytics has made GPU instances another factor. Administrators sizing a server must consider how many GPUs and how much memory is necessary. Surprisingly, the answer doesn't necessarily favor big boxes. Super Micro Computer delivers a 4-GPU 1U server, for example. These comfortably run mid- and low-end GPU cards, but will probably fry the latest high-end add-in cards.
Still, if an app can handle the memory size of these GPUs and doesn't require huge DRAM spaces, the 1U is a good fit. If more power is necessary, there are larger boxes that can handle eight top-line GPUs.
Evaluate power efficiency and reliability
It's a common myth that large, multiphase power supplies are more efficient. Popcorn servers, however, go into
Cooling can be a severe design constraint for small 1U servers. These servers are tightly packed and run hot, which reduces reliability. In this server sizing context, big servers have an edge because they have open airflow and big fans. Cloud providers compensate for this by powering off the small units that fail and transferring workloads to one of the many other popcorn units. This isn't an option if there are only a few big servers, so repairs must be immediate.
A typical cloud has sub-clusters with popcorn units and big servers. If an administrator is sizing a server and the available orchestration can't manage such a heterogeneous environment, it's likely time to look for another virtualization service. This is a common problem with hyper-converged systems, where the theoretical ability to mix a set of different platforms is frowned upon.